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New Biological Entities and the Use Thereof 

The present invention provides engineered enzymes comprised of a protein 
scaffold and Specificity Determining Regions, the production of such enzymes 
and the use thereof for therapeutic, research, diagnostic, nutritional care, 
personal care and industrial purposes. 

Background 

Academic and industrial research continuously searches for functional proteins to 
be used as therapeutic, research, diagnostic, nutritional, personal care or 
industrial agents. Today, such functional proteins can be classified mainly into 
two categories: natural proteins and engineered proteins. Natural proteins, on 
the one hand, are discovered from nature, e.g. by screening natural isolates or 
by sequencing genomes from diverse species. Engineered proteins, on the other 
hand, are typically based on known proteins and are altered in order to acquire 
modified functionalities. The present invention discloses engineered proteins with 
novel functions as compared to the starting components. Such proteins are 
called NBEs (New Biologic Entities). The NBEs disclosed in the present invention 
are engineered enzymes with novel substrate specificities or fusion proteins of 
such engineered enzymes with other functional components. 

Specificity is an essential element of enzyme function. A cell consists of 
thousands of different, highly reactive catalysts. Yet the cell is able to maintain a 
coordinated metabolism and a highly organized three-dimensional structure. This 
is due in part to the specificity of enzymes, i.e. the selective conversion of their 
respective substrates. Specificity is a qualitative and a quantitative property: the 
specificity of a particular enzyme can vary widely, ranging from just one 
particular type of target molecules to all molecular types with certain chemical 
substructures. In nature, the specificity of an organism's enzymes has been 
evolved to the particular needs of the organism. Arbitrary specificities with high 
value for therapeutic, research, diagnostic, nutritional or industrial applications 
are unlikely to be found in any organism's enzymatic repertoire due to the large 
space of possible specificities. The only realistic way of obtaining such 
specificities is their generation de novo. 
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When comparing enzymes with binders, a paradigm of specificity is given by 
antibodies recognizing individual epitopes as small distinct structures within large 
molecules. The naturally occurring vast range of antibody specificities is 
attributed to the diversity generated by the immune system combined with 
natural selection. Several mechanisms contribute to the vast repertoire of 
antibody specificity and occur at different stages of immune response generation 
and antibody maturation (Janeway, C et al. (1999) Immunobiology, Elsevier 
Science Ltd., Garland Publishing, New York). Specifically, antibodies contain 
complementarity determining regions (CDRs) which interact with the antigen in a 
highly specific manner and allow discrimination even between very similar 
epitopes. The light as well as the heavy chain of the antibody each contribute 
three CDRs to the binding domain. Nature uses recombination of various gene 
segments combined with further mutagenesis in the generation of CDRs. As a 
result, the sequences of the six CDR loops are highly variable in composition and 
length and this forms the basis for the diversity of binding specificities in 
antibodies. A similar principle for the generation of a diversity of catalytic 
specificities is not known from nature. 

Catalysis, i.e. the increase of the rate of a specific chemical reaction, is besides 
binding the most important protein function. Catalytic proteins, i.e. enzymes, are 
classified according to the chemical reaction they catalyze. 

Transferases are enzymes transferring a group, for example, the methyl group or 
a glycosyl group, from one compound (generally regarded as donor) to another 
compound (generally regarded as acceptor). For example, glycosyltransferases 
(EC 2.4) transfer glycosyl residues from a donor to an acceptor molecule. Some 
of the glycosyltransferases also catalyze hydrolysis, which can be regarded as 
transfer of a glycosyl group from the donor to water. The subclass is further 
subdivided into hexosyltransferases (EC 2.4.1), pentosyltransferases (EC 2.4.2) 
and those transferring other glycosyl groups (EC 2.4.99, Nomenclature 
Committee of the International Union of Biochemistry and Molecular Biology (NC- 
IUBMB)). 

Oxidoreductases catalyze oxido-reductions. The substrate that is oxidized is 
regarded as hydrogen or electron donor. Oxidoreductases are classified as 
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dehydrogenases, oxidases, mono- and dioxygenases. Dehydrogenases transfer 
hydrogen from a hydrogen donor to a hydrogen acceptor molecule. Oxidases 
react with molecular oxygen as hydrogen acceptor and produce oxidized products 
as well as either hydrogen peroxide or water. Monooxygenases transfer one 
oxygen atom from molecular oxygen to the substrate and one is reduced to 
water. In contrast, dioxygenases catalyze the insert of both oxygen atoms from 
molecular oxygen into the substrate. 

Lyases calalyze elimination reactions and thereby generate double bonds or, in 
the reverse direction, catalyze the additions at double bonds. Isomerases 
catalyze intramolecular rearrangements. Ligases catalyze the formation of 
chemical bonds at the expense of ATP consumption. 

Finally, hydrolases are enzymes that catalyze the hydrolysis of chemical bonds 
like C-O or C-N. The E.C .classification for these enzymes generally classifies 
them by the nature of the bond hydrolysed and by the nature of the substrate. 
Hydrolases such as lipases and proteases play an important role in nature as well 
in technical applications of biocatalysts. Proteases hydrolyse a peptide bond 
within the context of an oligo- or polypeptide. Depending on the catalytic 
mechanism proteases are grouped into aspartic, serin, cysteine, metallo- and 
threonine proteases (Handbook of proteolytic enzymes. (1998) Eds: Barret, A; 
Rawling, N.; Woessner, J.; Academic Press, London). This classification is based 
on the amino acid side chains that are responsible for catalysis and which are 
typically presented in the active site in very similar orientation to each other. The 
scissile bond of the substrate is brought into register with the catalytic residues 
due to specific interactions between the amino acid side chains of the substrate 
and complementary regions of the protease (Perona, J. & Craik, C (1995) Protein 
Science, 4, 337-360). The residues on the N- and C-terminal side of the scissile 
bond are usually called P u P 2 , P3 etc and Pi', P 2 ', P3' and the binding pockets 
complementary to the substrate Si, Sz, S 3 and Si', S 2 ', S 3 ', respectively 
(nomenclature according to Schlechter & Berger, Biochem. Biophys. Res. 
Commun. 27 (1967) 157-162). The selectivity of proteases can vary widely from 
being virtually nonselective - e.g. the Subtilisins - over a strict preference at the 
Pi position - e.g. Trypsin selectively cutting on the C-terminal side of arginine or 
lysine residues - to highly specific proteases - e.g. human tissue-type 
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plasminogen activator (t-PA) cleaving at the C-terminal side of the arginine in 
the sequence CPGRWG (Ding, L et al. (1995) Proc. Natl. Acad. Sci. USA 92, 
7627-7631; Coombs, G et al. (1996) J. Biol. Chem. 271, 4461-4467). 

The specificity of proteases, i.e. their ability to recognize and hydrolyze 
preferentially certain peptide substrates, can be expressed qualitatively and 
quantitatively. Qualitative specificity refers to the kind of amino acid residues 
that are accepted by a protease at certain positions of the peptide substrate. For 
example, trypsin and t-PA are related with respect to their qualitative specificity, 
since both of them require at the Pi position an arginine or a similar residue. On 
the other hand, quantitative specificity refers to the relative number of peptide 
substrates that are accepted as substrates by the protease, or more precisely, to 
the relative kcat/k M ratios of the protease for the different peptides that are 
accepted by the protease. Proteases that accept only a small portion of all 
possible peptides have a high specificity, whereas the specificity of proteases 
that, as an extreme, cleave any peptide substrate would theoretically be zero. 

Comparison of the primary, secondary as well as the tertiary structure of 
proteases (Fersht, A., Enzyme Structure and Mechanism, W. H. Freeman and 
Company, New York, 1995) allows identification of classes showing a high degree 
of conservation (Rawlings, N.D. & Barrett, A J. (1997) In: Proteolysis in Cell 
Functions Eds. Hopsu-Havu,V.K.; Jarvinen,M.; Kirschke,H, pp. 13-21, IOS Press, 
Amsterdam). A widely accepted scheme for protease classification has been 
proposed by Rawlings & Barrett (Handbook of proteolytic enzymes. (1998) Eds: 
Barret, A; Rawling, N.; Woessner, J.; Academic Press, London). For example, the 
serine proteases family can be subdivided into structural classes with 
chymotrypsin (class SI), subtilisin (class S8) and carboxypeptidase (class SC) 
folds, each of which includes nonspecific as well as specific proteases (Rawlings, 
N.D. & Barrett, AJ. (1994) Methods Enzymol. 244, 19-61). This applies to other 
protease families analogously. An additional distinction can be made according to 
the relative location of the cleaved bond in the substrate. Carboxy- and 
aminopeptidases cleave amino acids from the C- and N-terminus, respectively, 
while endopeptidases cut anywhere along the oligopeptide. 
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Many applications would be conceivable if enzymes with a basically unlimited 
spectrum of specificities were available. However, the use of such enzymes with 
high, low or any defined specificity is currently limited to those which can be 
isolated from natural sources. The field of application for these enzymes varies 
from therapeutic, research, diagnostic, nutritional to personal care and industrial 
purposes. 

Enzyme additives in detergents have come to constitute nearly a third of the 
whole industrial enzyme market. Detergent enzymes include proteinases for 
removing organic stains, lipases for removing greasy stains, amylases for 
removing residues of starchy foods and cellulases for restoring of smooth surface 
of the fiber. The best known detergent enzyme is probably the nonspecific 
proteinase subtilisin, isolated from various Bacillus species. 

Starch enzymes, such as amylases, occupy the majority of those used in food 
processing. While starch enzymes include products that are important for textile 
desizing, alcohol fermentation, paper and pulp processing, and laundry detergent 
additives, the largest application is for the production of high fructose corn 
syrup. The production of corn syrup from starch by means of industrial enzymes 
was a successful alternative to acid hydrolysis. 

Apart from starch processing, enzymes are used for an increasing range of 
applications in food. Enzymes in food can improve texture, appearance and 
nutritional value or may generate desirable flavours and aromas. Currently used 
food enzymes in bakery are amylase, amyloglycosidases, pentosanases for 
breakdown of pentosan and reduced gluten production or glucose oxidases to 
increase the stability of dough. Common enzymes for dairy are rennet (protease) 
as coagulant in cheese production, lactase for hydrolysis of lactose, protease for 
hydrolysis of whey proteins or catalase for the removel of hydrogen peroxides. 
Enzymes used in brewing process are the above named amylases, but also 
cellulases or proteases to clarify the beer from suspended proteins. In wines and 
fruit juices, cloudiness is more commenly caused by starch and pectins so that 
amylases and pectinases increase yield and clarification. Papain and other 
proteinases are used for meat tenderizing. 



WO 2004/113521 



PCT/EP2004/051172 



6 

Enzymes have also been developed to aid animals in the digestion of feed. In the 
western hemisphere, corn is a major source of food for cattle, swine, and 
poultry. In order to improve the bioavailability of phosphate from corn, phytase 
is commonly added (Wyss, M. et al. Biochemical characterization of fungal 
phytases (myo-inositol hexakisphosphate phosphohydrolases): Catalytic 
properties. Applied 8c Environmental Microbiology 65, 367-373 (1999)). 
Moreover, phytate hydrolysis has been shown to bring about improvements in 
digestibility of protein and absorption of minerals such as calcium (Bedford, M. R. 
& Schulze, H. EXOGENOUS ENZYMES FOR PIGS AND POULTRY [Review]. 
Nutrition Research Reviews 11, 91-114 (1998)). Another major feed enzyme is 
xylanase. This enzyme is particularly useful as a supplement for feeding stuff 
comprising more than about 10% of wheat barley or rye, because of their 
relatively high soluble fiber content. Xylanases cause two important actions: 
reduction of viscosity of the intestinal contents by hydrolyzing the gel-like high 
molecular weight arabinoxylans in feed (Murphy, T., C, Bedford, M. R. & 
McCracken, K. J. Effect of a range of new xylanases on in vitro viscosity and on 
performance of broiler diets. British Poultry Science 44, S16-S18 (2003)) and 
break down of polymers in cell wallswhich improve the bioavailability of protein 
and starch. 

Biotech research and development laboratories routinely use special enzymes in 
small quantities along with many other reagents. These enzymes create a 
significant market for various enzymes. Enzymes like alkaline phosphatase, 
horseradish peroxidase and luciferase are only some examples. Thermostable 
DNA polymerases like Taq polymerase or restriction endonucleases 
revolutionized laboratory work. Therapeutic enzymes are a particular class of 
drugs, categorized by the FDA as biologicals, with a lot of advantages compared 
to other, especially non-biological pharmaceuticals. Examples for successful 
therapeutic enzymes are human clotting factors like factor VIII and factor IX for 
human treatment. In addition, digestive enzymes are used for various 
deficiencies in human digestive processes. Other examples are t-PA and 
streptokinase for the treatment of cardiovascular disease, beta- 
glucocerebrosidase for the treatment of Type I Gaucher disease, L-asparaginase 
for the the treatment of acute lymphoblastic leukemia and DNAse for the 
treatment of cystic fibrosis. An important issue in the application of proteins as 
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therapeutics is their potential immunogenicity. To reduce this risk, one would 
prefer enzymes of human origin, which narrows down the set of available 
enzymes. The provision of designed enzymes, preferably of human origin, with 
novel, tailor-made specificities would allow the specific modification of target 
substrates at will, while minimizing the risk of immunogenicity. A further 
advantage of highly specific enzymes as therapeutics would be their lower risk of 
side effects. Due to the limited possibility of specific interactions between a small 
molecule and a protein, binding to non-target proteins and therefore side effects 
are quite common and often cause termination of an otherwise promising lead 
compound. Specific enzymes, on the other hand, provide many more contact 
sites and mechanisms for substrate discrimination and therefore enable a higher 
specificity and thereby less side activities. 

Proteases represent an important class of therapeutic agents {Drugs of today, 
33, 641-648 (1997)). However, currently the therapeutic protease is usually a 
substitute for insufficient acitivity of the body's own proteases. For example, 
factor VII can be administered in certain cases of coagulation deficiencies of 
bleeders or during surgery (Heuer L.; Blumenberg D. (2002) Anaesthesist 
51:388). Tissue-type plasminogen activator (t-PA) is applied in acute cardiac 
infarction, initializing the dissolution of fibrin clots through specific cleavage and 
activation of plasminogen (Verstraete, M. et al. (1995) Drugs, 50, 29-41). So far 
a protease with taylor-made specificity is generated to provide a therapeutic 
agent that specifically activates or inactivates a disease related target protein. 

Monoclonal antibodies represent another important biological class of substances 
with therapeutic capabilities. One of the main antibody targets are tumor 
necrosis factors (TNFs) which belong to the family of cytokines. TNFs play a 
major role in the inflammation process. As homotrimers they could bind to 
receptors of nearly every cell. They activate a multiplicity of cellular genes, 
multiple signal transduction mechanisms, kinases and transcription factors. The 
most important TNFs are TNF-alpha and TNF-beta. TNF-alpha is produced by 
macrophages, monocytes and other cells. TNF-alpha is an inflammation 
mediator. Therefore, research of the last decade has been focused on TNF-alpha 
inhibitors like monoclonal antibodies as possible therapeutics for different 
therapeutic indications like Rheumatoid Arthritis, Crohn's disease or Psoriasis 
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(Hamilton et al. (2000) Expert Opin Pharmacother, 1 (5): 1041-1052). One of 
the major disadvantages of monoclonal antibodies are their high costs, so that 
new biological alternatives are of great importance. 

There are a lot of examples for engineered enzymes in literature. Fulani et al. 
(Fulani F. et al. (2003) Protein Engineering 16, 515-519) describe a rhodanase 
(thiosulfatrcyanide sulfurtransferase) from Azotobacter vinelandii which has a 
catalytic domain structurally related to catalytic subunit of Cdc25 phosphatase 
enzymes. The difference in catalytic mechanism depends on the different size of 
the active site. Both rhodanase and phosphatase are highly specific on different 
substrates (sulfate vs. phosphate). The catalytic mechanism of the rhodanase 
could be shifted towards serine/threonine phosphatase by single-residue 
insertion. Therefore, Fulani et al. give a single example for the change of a 
catalytic mechanism by structural comparison and sequence alignment of 
naturally known enzymes from different enzyme classes but lack an indication of 
how to generate a user-definable substrate specificity while keeping the same 
catalytic mechanism. 

The thioredoxin reductase described by Briggs et al. (WO 02/090300 A2) has an 
altered cofactor specificity which preferably binds NADPH compared to NADH. 
Thus, both enzymes, the starting point as well as the resulting engineered 
enzyme are highly specific towards different substrates. The methods to achieve 
such an altered substrate specificity are either computational processing 
methods or sequence alignments of related proteins to define variable and 
conserved residues. They all have in common that they are based on the 
comparison of structures and sequences of proteins with known specificities 
followed by the transfer of the same to another backbone. 

There are other examples of specificity-engineered enzymes and, in particular, of 
proteases which have been published in the literature. None of these examples, 
however, provides a means for generating novel specificites compared to the 
specificity of the starting material used within the described methods. The 
methods range from structure-directed single point mutations (Kurth, T. et al. 
(1998) Biochemistry 37, 11434-11440; Ballinger, M et al. (1996) Biochemistry, 
35:13579-13585), exchange of surface loops between two specific proteases 
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(Horrevoets et al. (1993) J. Biol. Chem. 268, 779-782), to random mutagenesis 
either regio-selectiveiy or across the whole gene combined with in-vitro or in- 
vivo selection (Sices, H. & Kristie, T. (1998) Proc. Natl. Acad. ScL USA, 95, 
2828-2833). 

The rational design of protease specificity is limited to very few examples. This 
approach is severely limited by the insufficient understanding of the complexities 
that govern folding and dynamics as well as structure-function relationships in 
proteins (Corey, MJ. & Corey, E. (1996) Proc. Natl. Acad. ScL USA, 93:11428- 
11434). It is therefore difficult to alter the primary amino acid sequence of a 
protease in order to change its activity or specificity in a predictive way. In a 
successful example, Kurth et al. engineered trypsin to show a preference for a 
dibasic motive (Kurth, T. et al. (1998) Biochemistry, 37:11434-11440). In 
another example, Hedstrom et al. converted the Si substrate specificity of 
trypsin to that of chymotrypsin (Hedstrom, L. et al. (1992) Science, 255:1249- 
1253). This is an example where a known property was transferred from one 
backbone to another. 

Ballinger et al. (WO 96/27671) describe subtiiisin variants with combination 
mutations (N62D/G166D, and optionally Y104D) having a shift of substrate 
specificity towards peptide or polypeptide substrates with basic amino acids at 
the PI, P2 and P4 positions of the substrate. Suitable substrates of the variant 
subtiiisin were revealed by sorting a library of phage particles (substrate phage) 
containing five contiguous randomized residues. These subtiiisin variants are 
useful for cleaving fusion proteins with basic substrate linkers and processing 
hormones or other proteins (in vitro or in vivo) that contain basic cleavage sites. 
The problems associated with rational redesign of enzymes can partially be 
overcome by directed evolution (as disclosed in PCT/EP03/04864). These studies 
can be classified by their expression and selection systems. Genetic selection 
means to produce inside an organism an enzyme, e.g. a protease, which is able 
to cleave a precursor protein which in turn results in an alteration of the growth 
behavior of the producing organism. From a population of organisms with 
different proteases those can be selected which have an altered growth behavior. 
This principle was for example reported by Davis et al. (US 5258289, WO 
96/21009). The production of a phage system is dependent on the cleavage of a 
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phage protein which only can be activated in the presence of a proteolytic 
enzyme which is able to cleave the phage protein. Other approaches use a 
reporter system which allows a selection by screening instead of a genetic 
selection, but also cannot overcome the intrinsic insufficiency of the intracellular 
characterization of enzymes. 

Systems to generate enzymes with altered sequence specificities with self- 
secreting enzymes are also reported. Duff et al. (WO 98/11237) describe an 
expression system for a self-secreting protease. An essential element of the 
experimental design is that the catalytic reaction acts on the protease itself by an 
autoproteolytic processing of the membrane-bound precursor molecule to release 
the matured protease from the cellular membrane into the extracellular 
environment. Therefore, a fusion protein must be constructed where the target 
peptide sequence replaces the natural cleavage site for autoproteolysis. 
Limitations of such a system are that positively identified proteases will have the 
ability to cleave a certain amino acid sequence but they also may cleave many 
other peptide sequences. Therefore, high substrate specificity can not be 
achieved. Additionally, such a system is not able to control that selected 
proteases cleave at a specific position in a defined amino acid sequence and it 
does not allow a precise characterization of the kinetic constants of the selected 
proteases (k^t, K M ). 

A method has been described that aims at the generation of new catalytic 
activities and specificities within the a/B-barrel proteins (WO 01/42432; Fersht et 
al, Methods of producing novel enzymes; Altamirano et al. (2000) Nature 403, 
617-622). The a/B-barrel proteins comprise a large superfamily of proteins 
accounting for a large fraction of all known enzymes. The structure of the 
proteins is made from a/B-barrel surrounded by a-helices. The loops connecting 
B-strands and helices comprise the so-called lid-structure including the acitve 
site residues. The method is based on the classification of a/B-barrel proteins 
into two classes based on the catalytic lid structure. An extensive comparison of 
a/B-barrel protein structures led the authors to the conclusion that the substrate 
binding and specificity is primarily defined by the barrel structure while the 
specificity of the chemical reaction resides within the loops. It is suggested that 
barrels and lid structures from different enzymes can be combined to generate 
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new enzymatic activities and to provide a starting point to fine tune the 
properties by targeted or randomized mutagenesis and selection. The method 
does not provide for the generation of user-defined specificity. 

In summary, it is clear that there are many possible applications in the fields of 
therapeutics, research and diagnostics, Industrial enzymes, food and feed 
processing, cosmetics and other areas that would become possible by the 
availability of enzymes with a novel substrate specificity. However, only a limited 
number of specific enzymes has been identified from natural sources so far. 
Methods of rational design to modify, alter, convert or transfer sequence 
specificity as well as random approaches described above did not enable the 
generation of a novel and user-definablespecificity that was not present in the 
employed starting material. 

Therefore, none of the currently available methods can provide enzymes with a 
novel and user-defined sequence specificity. In contrast, the current invention 
provides such enzymes as well as methods for generating them. 

Summary of the Invention 

The objective of the present invention is to provide engineered proteins with 
novel functions that do not exist in the components used for the engineering of 
such proteins. In particular, the invention provides enzymes with user-definable 
specificities. User-definable specificity means that enzymes are provided with 
specificities that do not exist in the components used for the engineering of such 
enzymes. The specificities can be chosen by the user so that one or more 
intended target substrates are preferentially recognised and converted by the 
enzymes. Furthermore, the invention provides enzymes that possess essentially 
identical sequences to human proteins but have different specificities. In a 
particular embodiment, the invention provides proteases with user-definable 
specificities. 

Furthermore, the present invention is directed to engineered enzymes which are 
fused to one or more further functional components. These further components 
can be proteinacious components which preferably have binding properties and 
are of the group consisting of substrate binding domains, antibodies, receptors or 
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fragments thereof. Furthermore, these further components can be further 
functional components, preferably being selected from the group consisting of 
polyethylenglycols, carbohydrates, lipids, fatty acids, nudeic acids, metals, metal 
chelates, and fragments or derivatives thereof. The resulting fusion proteins are 
understood as enzymes with user-definable specificities within the present 
invention. 

Besides, the invention is directed to the application of such enzymes with novel, 
user-definable specificities for therapeutic, research, diagnostic, nutritional, 
personal care or industrial purposes. Moreover, the invention is directed to a 
method for generating engineered enzymes with user-definable specificities. In 
particular, the invention is directed to generate enzymes that possess essentially 
identical sequences to human enzymes but have different specificities. 

This problem has been solved by the embodiments of the invention specified in 
the description below and in the claims. The present invention is thus directed to 

(1) an engineered enzyme with defined specificity characterized by the 
combination of the following components,: 

(a) a protein scaffold which catalyzes at least one chemical reaction on at least 
one substrate, and 

(b) one or more specificity determining regions (SDRs) located at sites in the 
protein scaffold that enable the resulting engineered protein to discriminate 
between at least one target substrate and one or more different substrates, and 
wherein the SDRs are essentially synthetic peptide sequences; 

(2) the use of an engineered enzyme as defined in (1) above for therapeutic, 
research, diagnostic, nutritional, personal care or industrial purposes; 

(3) a method for generating engineered enzymes as defined in (1) above having 
specificities towards target substrates, such specificities not being present in the 
individual starting components, comprising at least the following steps: 

(a) providing a protein scaffold which catalyzes at least one chemical reaction on 
at least one substrate, 

(b) generating a library of engineered enzymes by combining the protein scaffold 
from step (a) with fully or partially random peptide sequences at sites in the 
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protein scaffold that enable the resulting engineered enzyme to discriminate 
between at least one target substrate and one or more different substrates, and 
(c) selecting out of the library of engineered enzymes generated in step (b) one 
or more enzymes that have specificities towards at least one target substrate; 

(4) a fusion protein which is comprised of at least one engineered enzyme as 
defined in (1) above and at least one further component, preferably the at least 
one further component having binding properties and more preferably being 
selected from the group consisting of antiboides, binding domains, receptors, and 
fragments thereof; 

(5) a composition or pharmaceutical composition comprising one or more 
engineered enzymes as defined in (1) above or a fusion protein as defined in (4) 
above, said pharmaceutical composition may optionally comprise an acceptable 
carrier, excipient and/or auxiliary agent; 

(6) a DNA encoding the engineered enzyme as defined in (1) above; 

(7) a vector comprising the DNA as defined in (6) above; 

(8) a host cell or transgenic organism being transformed/transfected with a 
vector as defined in (7) above and/or containing the DNA as defined in (6) 
above; and 

(9) a method for producing the engineered enzyme comprising culturing a cell or 
organism as defined in (8) above and isolating the enzyme from the culture 
broth. 
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Brief description of the Figures 

The following figures are provided in order to explain further the present 
invention in supplement to the detailed description: 

Figure 1 illustrates the three-dimensional structure of human trypsin I with the 
active site residues shown in "ball-and-stick" representation and with the marked 
regions indicating potential SDR insertion sites. 

Figure 2 shows the alignment of the primary amino acid sequence of three 
members of the serine protease class SI family: human trypsin I, human alpha- 
thrombin and human enteropeptidase (see also SEQ ID NOs: 1, 5 and 6). 

Figure 3 illustrates the three-dimensional structure of subtilisin with the active 
site residues being shown in w ball-and-stick" representation and with the 
numbered regions indicating potential SDR insertion sites. 

Figure 4 shows the alignment of the primary amino acid sequences of four 
members of the serine protease class S8 family: subtilisin E, furin, PCI and PCS 
(see also SEQ ID NOs: 7-10). 

Figure 5 illustrates the three-dimensional structure of pepsin with the active site 
residues being shown in "ball-and-stick" representation and with the numbered 
regions indicating potential SDR insertion sites. 

Figure 6 shows the alignment of the primary amino acid sequences of three 
members of the Al aspartic acid protease family: pepsin, 0-secretase and 
cathepsin D (see also SEQ ID NOs: 11-13). 

Figure 7 : illustrates the three-dimensional structure of caspase 7 with the active 
site residues being shown in "ball-and-stick" representation and with the 
numbered regions indicating potential SDR insertion sites. 

Figure 8 : shows the primary amino acid sequence of caspase 7 as a member of 
the cysteine protease class C14 family (see also SEQ ID NO: 14). 

Figure 9 depicts schematically the third aspect of the invention. 
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Figure 10 shows a Western blot analysis of a culture supernatant of cells 
expressing variants of human trypsin I with SDR1 and SDR2, compared to 
negative controls. 

Figure 11 shows the time course of the proteolytic cleavage of a target substrate 
by human trypsin I. 

Figure 12 shows the relative activities of three variants of inventive engineered 
proteolytic enzymes in comparison with human trypsin I on two different peptide 
substrates. 

Figure 13 shows the relative specificities of human trypsin I and variants of 
inventive engineered proteolytic enzymes with one or two SDRs, respectively. 

Figure 14 : shows the relative specificities of human trypsin I and of variants of 
inventive engineered proteolytic enzymes being specific for human TNF-alpha 
with this scaffold on peptides with a target sequence of human TNF-alpha. 

Figure 15 : shows the reduction of cytotoxicity induced by TNF-alpha when 
incubating the TNF-alpha with concentrated supernatant from cultures 
expressing the inventive engineered proteolytic enzymes being specific for 
human TNF-alpha. 

Figure 16 : shows the reduction of cytotoxicity induced by TNF-alpha when 
incubating the TNF-alpha with purified inventive engineered proteolytic enzyme 
being specific for human TNF-alpha. 

Figure 17 : compares the activity of inventive engineered proteolytic enzymes 
being specific for human TNF-alpha with the activity of human trypsin I on two 
protein substrates: (a) human TNF-alpha; (b) mixture of human serum proteins. 

Figure 18 : showes the specific activity of an inventive engineered proteolytic 
enzyme with specificity for human VEGF. 
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Definitions 

In the framework of the present invention the following terms and definitions are 
used. 

The term "protease" means any protein molecule that is capable of hydrolysing 
peptide bonds. This inctudes naturally-occurring or artificial proteolytic enzymes, 
as well as variants thereof obtained by site-directed or random mutagenesis or 
any other protein engineering method, any active fragment of a proteolytic 
enzyme, or any molecular complex or fusion protein comprising one of the 
aforementioned proteins. A "chimera of proteases" means a fusion protein of two 
or more fragments derived from different parent proteases. 

The term "substrate" means any molecule that can be converted catalytically by 
an enzyme. The term "peptide substrate" means any peptide, oligopeptide, or 
protein molecule of any amino acid composition, sequence or length, that 
contains a peptide bond that can be hydrolyzed catalytically by a protease. The 
peptide bond that is hydrolyzed is referred to as the "cleavage site". Numbering 
of positions in the substrate is done according to the system introduced by 
Schlechter & Berger (Biochem. Biophys. Res. Commun. 27 (1967) 157-162). 
Amino acid residues adjacent N-terminal to the cleavage site are numbered P u 
p 2/ P3/ etc., whereas residues adjacent C-terminal to the cleavage site are 
numbered Pi', P 2 ', P 3 ' , etc. 

The term "target substrate'' describes a user-defined substrate which is 
specifically recognized and converted by an enzyme according to the invention. 
The term "target peptide substrate" describes a user-defined peptide substrate. 
The term "target specificity" describes the qualitative and quantitative specificity 
of an enzyme that is capable of recognizing and converting a target substrate. 
Catalytic properties of enzymes are expressed using the kinetic parameters "K M M 
or "Michaelis Menten constant", "k^" or "catalytic rate constant", and "k^ /Km" 
or "catalytic efficiency", according to the definitions of Michaelis and Menten 
(Fersht, A., Enzyme Structure and Mechanism, W. H. Freeman and Company, 
New York, 1995). The term "catalytic activity" describes quantitatively the 
conversion of a given substrate under defined reaction conditions. 
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The term "specificity" means the ability of an enzyme to recognize and convert 
preferentially certain substrates. Specificity can be expressed qualitatively and 
quantitatively. "Qualitative specificity" refers to the chemical nature of the 
substrate residues that are recognized by an enzyme. "Quantitative specificity" 
refers to the number of substrates that are accepted as substrates. Quantitative 
specificity can be expressed by the term s, which is defined as the negative 
logarithm of the number of all accepted substrates divided by the number of all 
possible substrates. Proteases, for example, that accept preferentially a small 
portion of all possible peptide substrates have a "high specificity". Proteases that 
accept almost any peptide substrate have a "low specificity"- Definitions are 
made in accordance to WO 03/095670 which is therefore incorporated by 
reference. Proteases with very low specificity are also referred to as "unspecific 
proteases". The term "defined specificity" refers to a certain type of specificity, 
i.e. to a certain target subtrate or a set of certain target substrates that are 
preferentially converted versus other substrates. 

The term "engineered" in combination with the term "enzyme" describes an 
enzyme that is comprised of different components and that has features not 
being conferred by the individual components alone. 

The term "protein scaffold" or "scaffold protein" refers to a variety of primary, 
secondary and tertiary polypeptide structures. 

The term "peptide sequence" indicates any peptide sequence used for insertion 
or substitution into or combination with a protein scaffold. Peptide sequences are 
usually obtained by expression from DNA sequences which can be synthesized 
according to well-established techniques or can be obtained from natural 
sources. Insertion, substitution or combination of peptide sequences with the 
protein scaffold are generated by insertion, substitution or combination of 
oligonucleotides into or with a polynucleotide encoding the protein scaffold. The 
term "synthetic" in combination with the term "peptide sequence" refers to 
peptide sequences that are not present in the protein scaffold in which the 
peptide sequences are inserted or substituted or with which they are combined. 
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The term "components" in combination with the term "engineered enzyme" 
refers to peptide or polypeptide sequences that are combined in the engineering 
of such enzymes. Such components may among others comprise one or more 
protein scaffolds and one or more synthetic peptide sequences. The term "library 
of engineered enzymes" describes a mixture of engineered enzymes, whereby 
every single engineered enzyme is encoded by a different polynucleotide 
sequence. The term "gene library" indicates a library of polynucleotides that 
encodes the library of engineered enzymes. The term "SDR" or "Specificity 
determining region" refers to a synthetic peptide sequence that provides the 
defined specificity when combined with the protein scaffold at sites that enable 
the resulting enzymes to discriminate between the target substrate and one or 
more other substrates. Such sites are termed "SDR sites". 

The terms "tertiary structure similar to the structure of" and "similar tertiary 
structure" in combination with the terms "enzyme" or "protein" refer to proteins 
in which the type, sequence, connectivity and relative orientation of the typical 
secondary structural elements of a protein, e.g. alpha-helices, beta-sheets, beta- 
turns and loops, are similar and the proteins are therefore grouped into the same 
structural or topological class or fold. This includes proteins that have altered, 
additional or deleted structural elements of any type but otherwise unchanged 
topology. Examples of such structural classes are the TNF superfamily, the SI 
fold or the S8 fold within the serine proteases, the GPCRs, or the a/B-barrel fold. 

The term "positions that correspond structurally" indicates amino acids in 
proteins of similar tertiary structure that correspond structurally to each other, 
i.e. they are usually located within the same structural or topological element of 
the structure. Within the structural element they possess the same relative 
positions with respect to beginning and end of the structural element. If, e.g. the 
topological comparison of two proteins reveals two structurally corresponding 
sequences of different length, then amino acids within, e.g. 20% and 40% of the 
respective region lengths, correspond to each other structurally. 

The term "library of engineered enzymes" of the present invention refers to a 
multiplicity of enzymes or enzyme variants, which may exist as a mixture or in 
isolated form. 
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Amino acids residues are abbreviated according to the following Table 1 either in 
one- or in three-letter code. 

Table 1: Amino acid abbreviations 



Abbreviations 


Amino acid 


A 


Ala 


Alanine 


C 


Cys 


Cysteine 


D 


Asp 


Aspartic acid 


E 


Glu 


Glutamic acid 


F 


Phe 


Phenylalanine 


G 


Gly 


Glycine 


H 


His 


Histidine 


I 


He 


Isoleucine 


K 


Lys 


Lysine 


L 


Leu 


Leucine 


M 


Met 


Methionine 


N 


Asn 


Asparagine 


P 


Pro 


Proline 


Q 


Gin 


Glutamine 


R 


Arg 


Arginine 


S 


Ser 


Serine 


T 


Thr 


Threonine 


V 


Val 


Valine 


w 


Trp 


Tryptophane 


Y 


Tyr 


Tyrosine 



Detailed description of the invention 

The present invention provides engineered proteins with novel functions. In 
particular, the invention provides enzymes with user-definable specificities. In a 
particular embodiment, the invention provides proteases with user-definable 
specificities. Besides, the invention provides applications of such enzymes with 
novel, user-definable specificities for therapeutic, research, diagnostic, 
nutritional, personal care or industrial purposes. Moreover, the invention 
provides a method for generating enzymes with specificities that are not present 



WO 2004/113521 



PCT/EP2004/0511 72 



20 

in the components used for the engineering of such enzymes. In particular, the 
invention is directed to the generation of enzymes that have sequences that are 
essentially identical to mammalian, especially human enzymes but have different 
specificities . Moreover, the invention provides libraries of specific engineered 
enzymes with corresponding specificities encoded genetically, a method for the 
generation of libraries of specific engineered enzymes with corresponding 
specificities encoded genetically, and the application of such libraries for 
technical, diagnostic, nutritional, personal care or research purposes. 

A first aspect of the invention discloses engineered enzymes with defined 
specificities. These engineered enzymes are characterized by the following 
components: 

(a) a protein scaffold capable of catalyzing at least one chemical reaction on a 
substrate, and 

(b) one or more specificity determining regions (SDRs) located at sites in the 
protein scaffold that enable the resulting engineered protein to discriminate 
between ar least one target substrate and one or more different substrates, 
wherein the SDRs are essentially synthetic peptide sequences. 

Preferably, such defined specificity of the engineered enzymes is not conferred 
by the protein scaffold. 

In principle, the protein scaffold can have a variety of primary, secondary and 
tertiary structures. The primary structure, i.e. the amino acid sequence, can be 
an engineered sequence or can be derived from any viral, prokaryotic or 
eukaryotic origin. For human therapeutic use, however, the protein scaffold is 
preferably of mammalian origin, and more preferably, of human origin. 
Furthermore, the protein scaffold is capable to catalyze one or more chemical 
reactions and has preferably only a low specificity. 

Preferably, derivatives of the protein scaffold are used that have modified amino 
acid sequences that confer improved characteristics for the applicability as 
protein scaffolds. Such improved characteristics comprise, but are not limited to, 
stability; expression or secretion yield; folding, in particular after combination of 
the protein scaffold with SDRs; increased or decreased sensitivity to regulators 
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such as activators or inhibitors; immunogenicity; catalytic rate; kM or substrate 
affinity. 

The engineered enzymes reveal their quantitative specificity from the synthetic 
peptide sequences that are combined with the protein scaffold. Therefore, the 
engineered peptide sequences are acting as Specificity Determining Regions or 
SDRs. The number, the length and the positions of such SDRs can vary over a 
wide range. The number of SDRs within the scaffold is at least one, preferably 
more than one, more preferably between two and eleven, most preferably 
between two and six. The SDRs have a length between one and 50 amino acid 
residues, preferably a length between one and 15 amino acid residues, more 
preferably a length between one and six amino acid residues. Alternatively, the 
SDRs have a length between two and 20 amino acid residues, preferably a length 
between two and ten amino acid residues, more preferably a length between 
three and eight amino acid residues. 

The inventive engineered enzymes can further be desribed as antibody-like 
protein molecules comprising constant and variable regions, but having a non- 
immunoglogulin backbone and having an active site (catalytic activity) in the 
constant region, whereby the substrate specificity of the active site is modulated 
by the variable region. Preferably, as in the immunoglobulin structure, the 
variable regions are loops of variable length and composition that interact with a 
target molecule. 

In a particular variant of the invention, the engineered enzymes have hydrolase 
activity. In a preferred variant, the engineered enzymes have proteolytic activity. 
Particularly preferred protein scaffolds for this variant are unspecific proteases or 
are parts from unspecific proteases or are otherwise derived from unspecific 
proteases. The expressions "derived from" or "a derivative thereof" in this 
respect and in the following variants and embodiments refer to derivatives of 
proteins that are mutated at one or more amino acid positions and/or have a 
homology of at least 70%, preferably 90%, more preferably 95% and most 
preferably 99% to the original protein, and/or that are proteolytically processed, 
and/or that have an altered glycosylation pattern, and/or that are covalently 
linked to non-protein substances, and/or that are fused with further protein 
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domains, and/or that have C-terminal and/or N-terminal truncations, and/or that 
have specific insertions, substitutions and/or deletions. Alternatively, "derived 
from" may refer to derivatives that are combinations or chimeras of two or more 
fragments from two or more proteins, each of which optionally comprises any or 
all of the aforementioned modifications. The tertiary structure of the protein 
scaffold can be of any type. Preferably, however, the tertiary structure belongs 
to one of the following structural classes: class SI (chymotrypsin fold of the 
serine proteases family), class S8 (subtilisin fold of the serine proteases family), 
class SC (carboxypeptidase fold of the serine proteases family), class Al (pepsin 
A fold of the aspartic proteases), or class C14 (caspase-1 fold of the cysteine 
proteases). Examples of proteases that can serve as the protein scaffold of 
engineered proteolytic enzymes for the use as human therapeutics are or are 
derived from human trypsin, human thrombin, human chymotrypsin, human 
pepsin, human endothiapepsin, human caspases 1 to 14, and/or human furin. 

The defined specificity of the engineered proteolytic enzymes is a measure of 
their ability to discriminate between at least one target peptide or protein 
substrates and one or more further peptide or protein substrates. Preferably, the 
defined specificity refers to the ability to discriminate peptide or protein 
substrates that differ in other positions than the PI site, more preferably, the 
defined specificity refers to the ability to discriminate peptide or protein 
substrates that differ in other positions than the PI site and the PI' site. Most 
preferably, the engineered proteolytic enzymes distinguish target peptid or 
protein substrates at as many sites as is necessary to preferentially hydrolyse 
the target substrate versus other proteins. As an example, a therapeutically 
useful engineered proteolytic enzyme applied intravenously in the human body 
should be sufficiently specific to discriminate between the target substrate and 
any other protein in the human serum. Preferably, such an engineered 
proteolytic enzyme recognizes and discriminates peptide substrates at three or 
more amino acid positions, more preferably at four or more positions, and even 
more preferably at five or more amino acid positions. These positions may either 
be adjacent or non-adjacent. 

In a first embodiment , the protein scaffold has a tertiary structure or fold equal 
or similar to the tertiary structure or fold of the SI structural subclass of serine 
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proteases, i. e. the chymotrypsin fold, and/or has at least 70% identity on the 
amino acid level to a protein of the SI structural subclass of serine proteases. It 
is preferred that SDRs are inserted into the protein scaffold at one or more 
positions from the group of positions that correspond structurally or by amino 
acid sequence homology to the regions 18-25, 38-48, 54-63, 73-86, 122-130, 
148-156, 165-171 and 194-204 in human trypsin I, and more preferably at one 
or more positions from the group of positions that correspond structurally or by 
amino acid sequence homology to the regions 20-23, 41-45, 57-60, 76-83, 125- 
128, 150-153, 167-169 and 197-201 (numbering of amino acids according to 
SEQ ID NO:l). The number of SDRs to be combined with this type of protein 
scaffold is preferably between 1 and 10, and more preferably between 2 and 4. 
. Preferably, the protein scaffold is equal to or is a derivative or homologue of one 
or more of the following proteins: chymotrypsin, granzyme, kallikrein, trypsin, 
mesotrypsin, neutrophil elastase, pancreatic elastase, enteropeptidase, 
cathepsin, thrombin, ancrod, coagulation factor IXa, coagulation factor Vila, 
coagulation factor Xa, activated protein C, urokinase, tissue-type plasminogen 
activator, plasmin, Desmodus-type plasminogen activator. More preferably, the 
protein scaffold is trypsin or thrombin or is a derivative or homologue from 
trypsin or thrombin. For the use as a human therapeutic, the trypsin or thrombin 
scaffold is most preferably of human origin in order to minimize the risk of an 
immune response or an allergenic reaction. 

Preferably, derivatives with improved characteristics derived from human trypsin 
I or from proteins with similar tertiary structure are used. Preferred examples of 
such derivatives are derived from human trypsin I (SEQ ID NO:l) and comprise 
one or more of the following amino acid substitutions E56G; R78W; Y131F; 
A146T; C183R. 

It is preferred that at least one of two SDRs are inserted into human trypsin I, or 
a derivative thereof, between residues 42 and 43 (SDR 1) and between 123 and 
124 (SDR 2), respectively (numbering of amino acids according to SEQ ID NO:l). 
In addition the SDR 1 has a preferred length of 6 and the SDR 2 has a preferred 
length of 5 amino acids, respectively. In a preferred variant of this embodiment, 
the SDR 1 and SDR 2 sequences comprise one of the amino acid sequences listed 
in table 2. Such engineered proteolytic enzymes have specificity for the target 
substrate B as exemplified in example IV. 
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In a further embodiment the protein scaffold belongs to the S8 structural 
subclass of serine proteases and/or has a tertiary structure similar to subtilisin E 
from Bacillus subtilis_and/or has at least 70% identity on the amino acid level to 
a protein of the S8 structural subclass of serine proteases. Preferably, the 
scaffold belongs to the subtilisin family or the human pro-protein convertases. It 
is preferred that SDRs are inserted into the protein scaffold at one or more 
positions from the group of positions that correspond structurally or by amino 
acid sequence homology to the regions 6-17, 25-29, 47-55, 59-69, 101-111, 
117-125, 129-137, 139-154, 158-169, 185-195 and 204-225 in subtilisin E from 
Bacillus subtilis, and more preferably at one or more positions from the group of 
positions that correspond structurally or by amino acid sequence homology to 
the regions 59-69, 101-111, 129-137, 158-169 and 204-225 (numbering of 
amino acids according to SEQ ID NO:7). It is preferred that the protein scaffold 
is equal to or is a derivative or homologue of one or more of the following 
proteins: subtilisin Carlsberg; B. subtilis subtilisin E; subtilisin BPN'; B. 
licheniformis subtilisin; B. lentus subtilisin; Bacillus alcalophilus alkaline 
protease; proteinase K; kexin; human pno-protein convertase; human furin. In a 
preferred variant, subtilisin BPN' or one of the proteins SPC 1 to 7 is used as the 
protein scaffold. 

In a further embodiment the protein scaffold belongs to the family of aspartic 
proteases and/or has a tertiary structure similar to human pepsin. Preferably, the 
scaffold belongs to the Al class of proteases and/or has at least 70% identity on 
the amino acid level to a protein of the Al class of proteases. It is preferred that 
SDRs are inserted into the protein scaffold at one or more positions from the 
group of positions that correspond structurally or by amino acid sequence 
homology to the regions 6-18, 49-55, 74-83, 91-97, 112-120, 126-137, 159- 
164, 184-194, 242-247, 262-267 and 277-300 in human pepsin, and more 
preferably at one or more positions from the group of positions that correspond 
structurally or by amino acid sequence homology to the regions 10-15, 75-80, 
114-118, 130-134 / 186-191 and 280-296 (numbering of amino acids according 
to SEQ ID NO: 11). It is preferred that the protein scaffold is equal to or is a 
derivative or homologue of one or more of the following proteins: pepsin, 
chymosin, renin, cathepsin, yapsin. Preferably, pepsin or endothiopepsin or a 
derivative or homologue thereof is used as the protein scaffold. 
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In a further embodiment the protein scaffold belongs to the cysteine protease 
family and/or has a tertiary structure similar to human caspase 7. Preferably the 
scaffold belongs to the C14 class of cysteine proteases or has at least 70% 
identity on the amino acid level to a protein of the C14 class of cysteine 
proteases. It is preferred that SDRs are inserted into the protein scaffold at one 
or more positions from the group of positions that correspond structurally or by 
amino acid sequence homology to the regions 78-91, 144-160, 186-198, 226- 
243 and 271-291 in human caspase 7, and more preferably at one or more 
positions from the group of positions that correspond structurally or by amino 
acid sequence homology to the regions 80-86, 149-157, 190-194 and 233-238 
(numbering of amino acids according to SEQ ID NO: 14). It is preferred that the 
protein scaffold is equal to or is a derivative or homologue of one of the caspases 
1 to 9. 

In a further embodiment the protein scaffold belongs to the Sll class of serine 
proteases or has at least 70% identity on the amino acid level to a protein of the 
Sll class of serine proteases and/or has a tertiary structure similar to D-alanyl- 
D-alanine transpeptidase from Streptomyces species K15. It is preferred that 
SDRs are inserted into the protein scaffold at one or more positions from the 
group of positions that correspond structurally or by amino acid sequence 
homology to the regions 67-79, 137-150, 191-206, 212-222 and 241-251 in D- 
alanyl-D-alanine transpeptidase from Streptomyces species K15, and more 
preferably at one or more positions from the group of positions that correspond 
structurally or by amino acid sequence homology to the regions 70-75, 141-147, 
195-202 and 216-220 (numbering of amino acids according to SEQ ID NO: 15). It 
is preferred that the D-alanyl-D-alanine transpeptidase from Streptomyces 
species K15 or a derivative or homologue thereof is used as the scaffold. 

In a further embodiment the protein scaffold belongs to the S21 class of serine 
proteases or has at least 70% identity on the amino acid level to a protein of the 
S21 class of serine proteases and/or has a tertiary structure similar to assemblin 
from human cytomegalovirus. It is preferred that SDRs are inserted into the 
protein scaffold at one or more positions from the group of positions that 
correspond structurally or by amino acid sequence homology to the regions 25- 
33, 64-69, 134-155, 162-169 and 217-244 in assemblin from human 
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cytomegalovirus, and more preferably at one or more positions from the group of 
positions that correspond structurally or by amino acid sequence homology to 
the regions 27-3 1, 164-168 and 222-239 (numbering of amino acids according to 
SEQ ID NO: 16). It is preferred that the assemblin from human cytomegalovirus 
or a derivative or homologue thereof is used as the scaffold. 

In a further embodiment the protein scaffold belongs to the S26 class of serine 
proteases or has at least 70% identity on the amino acid level to a protein of the 
S26 class of serine proteases and/or has a tertiary structure similar to the signal 
peptidase from Escherichia coli. It is preferred that SDRs are inserted into the 
protein scaffold at one or more positions from the group of positions that 
correspond structurally or by amino acid sequence homology to the regions 8-14, 
57-68, 125-134, 239-254, 200-211 and 228-239 in signal peptidase from 
Escherichia coli, and more preferably at one or more positions from the group of 
positions that correspond structurally or by amino acid sequence homology to 
the regions 9-13, 60-67, 127-132 and 203-209 (numbering of amino acids 
according to SEQ ID NO: 17). It is preferred that the signal peptidase from 
Escherichia coli or a derivative or homologue thereof is used as the scaffold. 

In an further embodiment the protein scaffold belongs to the S33 class of serine 
proteases or has at least 70% identity on the amino acid level to a protein of the 
S33 class of serine proteases and/or has a tertiary structure similar to the prolyl 
aminopeptidase from Serratia marcescens. It is preferred that SDRs are inserted 
into the protein scaffold at one or more positions from the group of positions that 
correspond structurally or by amino acid sequence homology to the regions 47- 
54, 152-160, 203-212 and 297-302 in prolyl aminopeptidase from Serratia 
marcescens, and more preferably at one or more positions from the group of 
positions that correspond structurally or by amino acid sequence homology to 
the regions 50-53, 154-158 and 206-210 (numbering of amino acids according to 
SEQ ID NO: 18). It is preferred that the prolyl aminopeptidase from Serratia 
marcescens or a derivative or homologue thereof is used as the scaffold. 

In a further embodiment the protein scaffold belongs to the S51 class of serine 
proteases or has at least 70% identity on the amino acid level to a protein of the 
S51 dass of serine proteases and/or has a tertiary structure similar to aspartyl 
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dipeptidase from Escherichia coli. It is preferred that SDRs are inserted into the 
protein scaffold at one or more positions from the group of positions that 
correspond structurally or by amino acid sequence homology to the regions 8-16, 
38-46, 85-92, 132-140, 159-170 and 205-211 in aspartyl dipeptidase from 
Escherichia coli, and more preferably at one or more positions from the group of 
positions that correspond structurally or by amino acid sequence homology to 
the regions 10-14, 87-90, 134-138 and 160-165 (numbering of amino acids 
according to SEQ ID NO: 19). It is preferred that the aspartyl dipeptidase from 
Escherichia coli or a derivative or homologue thereof is used as the scaffold. 

In a further embodiment the protein scaffold belongs to the A2 class of aspartic 
proteases or has at least 70% identity on the amino acid level to a protein of the 
A2 class of aspartic proteases and/or has a tertiary structure similar to the 
protease from human immunodeficiency virus. It is preferred that SDRs are 
inserted into the protein scaffold at one or more positions from the group of 
positions that correspond structurally or by amino acid sequence homology to 
the regions 5-12, 17-23, 27-30, 33-38 and 77-83 in protease from human 
immunodeficiency virus, and more preferably at one or more positions from the 
group of positions that correspond structurally or by amino acid sequence 
homology to the regions 7-10, 18-21, 34-37 and 79-82 (numbering of amino 
acids according to SEQ ID NO:20). It is preferred that the protease from human 
immunodeficiency virus, preferably HIV-1 protease, or a derivative or homologue 
thereof is used as the scaffold. 

In an further embodiment the protein scaffold belongs to the A26 class of 
aspartic proteases or has at least 70% identity on the amino acid level to a 
protein of the A26 class of aspartic proteases and/or has a tertiary structure 
similar to the omptin from Escherichia coli. It is preferred that SDRs are inserted 
into the protein scaffold at one or more positions from the group of positions that 
correspond structurally or by amino acid sequence homology to the regions 28- 
40, 86-98, 150-168, 213-219 and 267-278 in omptin from Escherichia coli, and 
more preferably at one or more positions from the group of positions that 
correspond structurally or by amino acid sequence homology to the regions 33- 
38, 161-168 and 273-277 (numbering of amino acids according to SEQ ID 
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NO:21). It is preferred that the omptin from Escherichia coli or a derivative or 
homologue thereof is used as the scaffold. 

In a further embodiment the protein scaffold belongs to the CI class of cysteine 
proteases or has at least 70% identity on the amino acid level to a protein of the 
CI class of cysteine proteases and/or has a tertiary structure similar to the 
papain from Carica papaya. It is preferred that SDRs are inserted into the protein 
scaffold at one or more positions from the group of positions that correspond 
structurally or by amino acid sequence homology to the regions 17-24, 61-68, 
88-95, 135- 142, 153-158 and 176-184 in papain from Carica papaya, and more 
preferably at one or more positions from the group of positions that correspond 
structurally or by amino acid sequence homology to the regions 63-66, 136-139 
and 177-181 (numbering of amino acids according to SEQ ID NO:22). It is 
preferred that the papain from Carica papaya or a derivative or homologue 
thereof is used as the scaffold. 

In a further embodiment the protein scaffold belongs to the C2 class of cysteine 
proteases or has at least 70% identity on the amino acid level to a protein of the 
C2 class of cysteine proteases and/or has a tertiary structure similar to human 
calpain-2. It is preferred that SDRs are inserted into the protein scaffold at one 
or more positions from the group of positions that correspond structurally or by 
amino acid sequence homology to the regions 90-103, 160-172, 193-199, 243- 
260, 286-294 and 316-322 in human calpain-2, and more preferably at one or 
more positions from the group of positions that correspond structurally or by 
amino acid sequence homology to the regions 92-101, 245-250 and 287-291 
(numbering of amino acids according to SEQ ID NO:23). It is preferred that the 
human calpain-2 or a derivative or homologue thereof is used as the scaffold. 

In a further embodiment the protein scaffold belongs to the C4 class of cysteine 
proteases or has at least 70% identity on the amino acid level to a protein of the 
C4 class of cysteine proteases and/or has a tertiary structure similar to NIa 
protease from tobacco etch virus. It is preferred that SDRs are inserted into the 
protein scaffold at one or more positions from the group of positions that 
correspond structurally or by amino acid sequence homology to the regions 23- 
31, 112-120, 144-150, 168-176 and 205-218 in NIa protease from tobacco etch 
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virus, and more preferably at one or more positions from the group of positions 
that correspond structurally or by amino acid sequence homology to the regions 
145-149, 169-174 and 212-218 (numbering of amino acids according to SEQ ID 
NO:24). It is preferred that the NIa protease from tobacco etch virus (TEV 
protease) or a derivative or homologue thereof is used as the scaffold. 

In a further embodiment the protein scaffold belongs to the CIO class of cysteine 
proteases or has at least 70% identity on the amino acid level to a protein of the 
CIO class of cysteine proteases and/or has a tertiary structure similar to the 
streptopain from Streptococcus pyogenes. It is preferred that SDRs are inserted 
into the protein scaffold at one or more positions from the group of positions that 
correspond structurally or by amino acid sequence homology to the regions 81- 
90, 133-140, 150-164, 191-199, 219-229, 246-256, 306-312 and 330-337 in 
streptopain from Streptococcus pyogenes, and more preferably at one or more 
positions from the group of positions that correspond structurally or by amino 
acid sequence homology to the regions 82-87, 134-138, 250-254 and 331-335 
(numbering of amino acids according to SEQ ID NO:25). It is preferred that the 
streptopain from Streptococcus pyogenes or a derivative or homologue thereof is 
used as the scaffold. 

In a further embodiment the protein scaffold belongs to the C19 class of cysteine 
proteases or has at least 70% identity on the amino acid level to a protein of the 
C19 class of cysteine proteases and/or has a tertiary structure similar to human 
ubiquitin specific protease 7. It is preferred that SDRs are inserted into the 
protein scaffold at one or more positions from the group of positions that 
correspond structurally or by amino acid sequence homology to the regions 3-15, 
63-70, 80-86, 248-256, 272-283 and 292-304 in human ubiquitin specific 
protease 7, and more preferably at one or more positions from the group of 
positions that correspond structurally or by amino acid sequence homology to 
the regions 10-15, 251-255, 277-281 and 298-304 (numbering of amino acids 
according to SEQ ID NO:26). It is preferred that the human ubiquitin specific 
protease 7 or a derivative or homologue thereof is used as the scaffold. 

In a further embodiment the protein scaffold belongs to the C47 class of cysteine 
proteases or has at least 70% identity on the amino acid level to a protein of the 
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C47 class of cysteine proteases and/or has a tertiary structure similar to the 
staphopain from Staphylococcus aureus. It is preferred that SDRs are inserted 
into the protein scaffold at one or more positions from the group of positions that 
correspond structurally or by amino acid sequence homology to the regions 15- 
23, 57-66, 108-119, 142-149 and 157-164 in staphopain from Staphylococcus 
aureus, and more preferably at one or more positions from the group of positions 
that correspond structurally or by amino acid sequence homology to the regions 
17-22, 111-117, 143-147 and 159-163 (numbering of amino acids according to 
SEQ ID NO:27). It is preferred that the staphopain from Staphylococcus aureus 
or a derivative or homologue thereof is used as the scaffold. 

In an further embodiment the protein scaffold belongs to the C48 class of 
cysteine proteases or has at least 70% identity on the amino acid level to a 
protein of the C48 class of cysteine proteases and/or has a tertiary structure 
similar to the Ulpl endopeptidase from Saccharomyces cerevisiae. It is preferred 
that SDRs are inserted into the protein scaffold at one or more positions from the 
group of positions that correspond structurally or by amino acid sequence 
homology to the regions 40-51, 108-115, 132-141, 173-179 and 597-605 in 
Ulpl endopeptidase from Saccharomyces cerevisiae, and more preferably at one 
or more positions from the group of positions that correspond structurally or by 
amino acid sequence homology to the regions 43-49, 110-113, 133-137 and 
175-178 (numbering of amino acids according to SEQ ID NO:28). It is preferred 
that the Ulpl endopeptidase from Saccharomyces cerevisiae or a derivative or 
homologue thereof is used as the scaffold. 

In a further embodiment the protein scaffold belongs to the C56 class of cysteine 
proteases or has at least 70% identity on the amino acid level to a protein of the 
C56 class of cysteine proteases and/or has a tertiary structure similar to the Pfpl 
endopeptidase from Pyrococcus horikoshii. It is preferred that SDRs are inserted 
into the protein scaffold at one or more positions from the group of positions that 
correspond structurally or by amino acid sequence homology to the regions 8-16, 
40-47, 66-73, 118-125 and 147-153 in Pfpl endopeptidase from Pyrococcus 
horikoshii, and more preferably at one or more positions from the group of 
positions that correspond structurally or by amino acid sequence homology to 
the regions 9-14, 68-71, 120-123 and 148-151 (numbering of amino acids 



WO 2004/113521 



31 



PCT/EP2004/051172 



according to SEQ ID NO:29). It is preferred that the Pfpl endopeptidase from 
Pyrococcus horikoshii or a derivative or homologue thereof is used as the 
scaffold. 

In a further embodiment the protein scaffold belongs to the M4 class of metallo 
proteases or has at least 70% identity on the amino acid level to a protein of the 
M4 class of metallo proteases and/or has a tertiary structure similar to 
thermolysin from Bacillus thermoproteolyticus. It is preferred that SDRs are 
inserted into the protein scaffold at one or more positions from the group of 
positions that correspond structurally or by amino acid sequence homology to 
the regions 106-118, 125-130, 152-160, 197-204, 210-213 and 221-229 in 
thermolysin from Bacillus thermoproteolyticus, and more preferably at one or 
more positions from the group of positions that correspond structurally or by 
amino acid sequence homology to the regions 108-115, 126-129, 199-203 and 
223-227 (numbering of amino acids according to SEQ ID NO:30). It is preferred 
that the thermolysin from Bacillus thermoproteolyticus or a derivative or 
homologue thereof is used as the scaffold. 

In a further embodiment the protein scaffold belongs to the M10 class of metallo 
proteases or has at least 70% identity on the amino acid level to a protein of the 
M10 class of metallo proteases and/or has a tertiary structure similar to human 
collagenase. It is preferred that SDRs are inserted into the protein scaffold at one 
or more positions from the group of positions that correspond structurally or by 
amino acid sequence homology to the regions 2-7, 68-79, 85-90, 107-111 and 
135-141 in human collagenase, and more preferably at one or more positions 
from the group of positions that correspond structurally or by amino acid 
sequence homology to the regions 3-6, 71-78 and 136-140 (numbering of amino 
acids according to SEQ ID NO:31). It is preferred that human collagenase or a 
derivative or homologue thereof is used as the scaffold. 

It is further preferred that the engineered enzymes have glycosidase activity. A 
particularly suited protein scaffold for this variant is a glycosylase or is derived 
from a glycosylase. Preferably, the tertiary structure belongs to one of the 
following structural classes: class GH13, GH7, GH12, GH11, GH10, GH28, GH26, 
and GH18 (beta/alpha)8 barrel. 
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In a first embodiment the protein scaffold belongs to the GH13 class of 
glycosylases or has at least 70% identity on the amino acid level to a protein of 
the GH13 class of glycosylases and/or has a tertiary structure similar to human 
pancreatic alpha-amylase. It is preferred that SDRs are inserted into the protein 
scaffold at one or more positions from the group of positions that correspond 
structurally or by amino acid sequence homology to the regions 50-60, 100-110, 
148-167, 235-244, 302-310 and 346-359 in human pancreatic alpha-amylase, 
and more preferably at one or more positions from the group of positions that 
correspond structurally or by amino acid sequence homology to the regions 51- 
58, 148-155 and 303-309 (numbering of amino acids according to SEQ ID 
NO:32). It is preferred that human pancreatic alpha-amylase or a derivative or 
homologue thereof is used as the scaffold. 

In a further embodiment the protein scaffold belongs to the GH7 class of 
glycosylases or has at least 70% identity on the amino acid level to a protein of 
the GH7 class of glycosylases and/or has a tertiary structure similar to cellulase 
from Trichoderma reesei. It is preferred that SDRs are inserted into the protein 
scaffold at one or more positions from the group of positions that correspond 
structurally or by amino acid sequence homology to the regions 47-56, 93-104, 
173-182, 215-223, 229-236 and 322-334 in cellulase from Trichoderma reesei, 
and more preferably at one or more positions from the group of positions that 
correspond structurally or by amino acid sequence homology to the regions 175- 
180, 218-222 and 324-332 (numbering of amino acids according to SEQ ID 
NO:33). It is preferred that cellulase from Trichoderma reesei or a derivative or 
homologue thereof is used as the scaffold. 

In a further embodiment the protein scaffold belongs to the GH12 class of 
glycosylases or has at least 70% identity on the amino acid level to a protein of 
the GH12 class of glycosylases and/or has a tertiary structure similar to cellulase 
from Aspergillus niger. It is preferred that SDRs are inserted into the protein 
scaffold at one or more positions from the group of positions that correspond 
structurally or by amino acid sequence homology to the regions 18-28, 55-60, 
106-113, 126-132 and 149-159 in cellulase from Aspergillus niger, and more 
preferably at one or more positions from the group of positions that correspond 
structurally or by amino acid sequence homology to the regions 20-26, 56-59, 
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108-112 and 151-156 (numbering of amino acids according to SEQ ID NO:34). It 
is preferred that cellulase from Aspergillus niger or a derivative or homologue 
thereof is used as the scaffold. 

In a further embodiment the protein scaffold belongs to the GH11 class of 
glycosylases or has at least 70% identity on the amino acid level to a protein of 
the GH11 class of glycosylases and/or has a tertiary structure similar to xylanase 
from Aspergillus niger. It is preferred that SDRs are inserted into the protein 
scaffold at one or more positions from the group of positions that correspond 
structurally or by amino acid sequence homology to the regions 7-14, 33-39, 88- 
97, 114-126 and 158-167 in xylanase from Aspergillus niger, and more 
preferably at one or more positions from the group of positions that correspond 
structurally or by amino acid sequence homology to the regions 20-26, 56-59, 
108-112 and 151-156 (numbering of amino acids according to SEQ ID NO:35). It 
is preferred that xylanase from Aspergillus niger or a derivative or homologue 
thereof is used as the scaffold. 

In a further embodiment the protein scaffold belongs to the GH10 class of 
glycosylases or has at least 70% Identity on the amino acid level to a protein of 
the GH10 class of glycosylases and/or has a tertiary structure similar to xylanase 
from Streptomyces lividans. It is preferred that SDRs are inserted into the 
protein scaffold at one or more positions from the group of positions that 
correspond structurally or by amino acid sequence homology to the regions 21- 
29, 42-50, 84-92, 130-136, 206-217 and 269-278 in xylanase from 
Streptomyces lividans, and more preferably at one or more positions from the 
group of positions that correspond structurally or by amino acid sequence 
homology to the regions 43-49, 86-90, 208-213 and 271-276 (numbering of 
amino acids according to SEQ ID NO: 36). It is preferred that xylanase from 
Streptomyces lividans or a derivative or homologue thereof is used as the 
scaffold. 

In a further embodiment the protein scaffold belongs to the GH28 class of 
glycosylases or has at least 70% identity on the amino acid level to a protein of 
the GH28 class of glycosylases and/or has a tertiary structure similar to 
pectinase from Aspergillus niger. It is preferred that SDRs are inserted into the 
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protein scaffold at one or more positions from the group of positions that 
correspond structurally or by amino acid sequence homology to the regions 82- 
88, 118-126, 171-178, 228-236, 256-264 and 289-299 in pectinase from 
Aspergillus niger, and more preferably at one or more positions from the group 
of positions that correspond structurally or by amino acid sequence homology to 
the regions 116-124, 174-178 and 291-296 (numbering of amino acids according 
to SEQ ID NO:37). It is preferred that pectinase from Aspergillus niger or a 
derivative or homologue thereof is used as the scaffold. 

In a further embodiment the protein scaffold belongs to the GH26 class of 
glycosylases or has at least 70% identity on the amino acid level to a protein of 
the GH26 class of glycosylases and/or has a tertiary structure similar to 
mannanase from Pseudomonas cellulosa. It is preferred that SDRs are inserted 
into the protein scaffold at one or more positions from the group of positions that 
correspond structurally or by amino acid sequence homology to the regions 75- 
83, 113-125, 174-182, 217-224, 247-254, 324-332 and 325-340 in mannanase 
from Pseudomonas cellulosa, and more preferably at one or more positions from 
the group of positions that correspond structurally or by amino acid sequence 
homology to the regions 115-123, 176-180, 286-291 and 328-337 (numbering of 
amino acids according to SEQ ID NO:38). It is preferred that mannanase from 
Pseudomonas cellulosa or a derivative or homologue thereof is used as the 
scaffold. 

In an further embodiment the protein scaffold belongs to the GH18 (beta/alpha)8 
barrel class of glycosylases or has at least 70% identity on the amino acid level 
to a protein of the GH18 class of glycosylases and/or has a tertiary structure 
similar to chitinase from Bacillus circulans. It is preferred that SDRs are inserted 
into the protein scaffold at one or more positions from the group of positions that 
correspond structurally or by amino acid sequence homology to the regions 21- 
29, 57-65, 130-136, 176-183, 221-229, 249-257 and 327-337 in chitinase from 
Bacillus circulans, and more preferably at one or more positions from the group 
of positions that correspond structurally or by amino acid sequence homology to 
the regions 59-63, 178-181 r 250-254 and 330-336 (numbering of amino acids 
according to SEQ ID NO: 39). It is preferred that chitinase from Bacillus circulans 
or a derivative or homologue thereof is used as the scaffold. 
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It is further preferred that the engineered enzymes have esterhydrolase activity. 
Preferably, the protein scaffold for this variant have lipase, phosphatase, 
phytase, or phosphodiesterase activity. 

In a first embodiment the protein scaffold belongs to the GX class of esterases or 
has at least 70% identity on the amino add level to a protein of the GX class of 
esterases and/or has a tertiary structure similar to the structure of the lipase B 
from Candida antarctica. Preferably, the scaffold has lipase activity. It is 
preferred that SDRs are inserted into the protein scaffold at one or more 
positions from the group of positions that correspond structurally or by amino 
acid sequence homology to the regions 139-148, 188-195, 216-224, 256-266, 
272-287 in lipase B from Candida antarctica, and more preferably at one or more 
positions from the group of positions that correspond structurally or by amino 
acid sequence homology to the regions 141-146, 218-222, 259-263 and 275-283 
(numbering of amino acids according to SEQ ID NO:40). It is preferred that 
lipase B from Candida antarctica or a derivative or homologue thereof is used as 
the scaffold. 

In a further embodiment the protein scaffold belongs to the GX class of esterases 
or has at least 70% identity on the amino acid level to a protein of the GX class 
of esterases and/or has a tertiary structure similar to the pancreatic lipase from 
guinea pig. Preferably, the scaffold has lipase activity. It is preferred that SDRs 
are inserted into the protein scaffold at one or more positions from the group of 
positions that correspond structurally or by amino acid sequence homology to 
the regions 78-90, 91-100, 112-120, 179-186, 207-218, 238-247 and 248-260 
in pancreatic lipase from guinea pig, and more preferably at one or more 
positions from the group of positions that correspond structurally or by amino 
acid sequence homology to the regions 80-87, 114-118, 209-215 and 239-246 
(numbering of amino acids according to SEQ ID NO:41). It is preferred that 
pancreatic lipase from guinea pig or a derivative or homologue thereof is used as 
the scaffold. 

In a further embodiment the protein scaffold has a tertiary structure similar to 
the structure of the alkaline phosphatase from Escherichia coli or has at least 
70% identity on the amino acid level to a protein that has a tertiary structure 
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similar to the structure of the alkaline phosphatase from Escherichia coli. 
Preferably, the scaffold has phosphatase activity. It is preferred that SDRs are 
inserted into the protein scaffold at one or more positions from the group of 
positions that correspond structurally or by amino acid sequence homology to 
the regions 110-122, 187-142, 170-175, 186-193, 280-287 and 425-435 in 
alkaline phosphatase from Escherichia coli, and more preferably at one or more 
positions from the group of positions that correspond structurally or by amino 
acid sequence homology to the regions 171-174, 187-191, 282-286 and 426-433 
(numbering of amino acids according to SEQ ID NO:42). It is preferred that 
alkaline phosphatase from Escherichia coli or a derivative or homologue thereof 
is used as the scaffold. 

In a further embodiment the protein scaffold has a tertiary structure similar to 
the structure of the bovine pancreatic desoxyribonuclease I or has at least 70% 
identity on the amino acid level to a protein that has a tertiary structure similar 
to the structure of the bovine pancreatic desoxyribonuclease I. Preferably, the 
scaffold has phosphodiesterase activity. More preferably, a nuclease, and most 
preferably, an unspecific endonuclease or a derivative thereof is used as the 
scaffold. It is preferred that SDRs are inserted into the protein scaffold at one or 
more positions from the group of positions that correspond structurally or by 
amino acid sequence homology to the regions 14-21, 41-47 f 72-77, 97-111, 
135-143, 171-178, 202-209 and 242-251 in bovine pancreatic 
desoxyribonuclease I, and more preferably at one or more positions from the 
group of positions that correspond structurally or by amino acid sequence 
homology to the regions 16-19, 42-46, 136-141 and 172-176 (numbering of 
amino acids according to SEQ ID NO:43). It is preferred that bovine pancreatic 
desoxyribonuclease I or human desoxyribonuclease I or a derivative or 
homologue thereof is used as the scaffold. 

It is further preferred that the engineered enzyme has transferase activity. A 
particularly suited protein scaffold for this variant is a glycosyl-, a phospho- or a 
methyltransferase, or is a derivative thereof. Particularly preferred protein 
scaffolds for this variant are glycosyltransferases or are derived from 
glycosyltransferases. The tertiary structure of the protein scaffold can be of any 
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type. Preferably, however, the tertiary structure belongs to one of the following 
structural classes: GH13 and GT1. 

In a first embodiment the protein scaffold belongs to the GH13 class of 
transferases or has at least 70% identity on the amino acid level to a protein of 
the GH13 class of transferases and/or has a tertiary structure similar to the 
structure of the cyclomaltodextrin glucanotransferase from Bacillus circulans. 
Preferably, the scaffold has transferase activity, and more preferably a 
glycosyltransferase is used as the scaffold. It is preferred that SDRs are inserted 
into the protein scaffold at one or more positions from the group of positions that 
correspond structurally or by amino acid sequence homology to the regions 38- 
48, 85-94, 142-154, 178-186, 259-266, 331-340 and 367-377 in 
cyclomaltodextrin glucanotransferase from Bacillus circulans, and more 
preferably at one or more positions from the group of positions that correspond 
structurally or by amino acid sequence homology to the regions 87-92, 180-185, 
261-264 and 269-275 (numbering of amino acids according to SEQ ID NO:44). It 
is preferred that cyclomaltodextrin glucanotransferase from Bacillus circulans or 
a derivative or homologue thereof is used as the scaffold. 

In a further embodiment the protein scaffold belongs to the GT1 class of 
tranferases or has at least 70% identity on the amino acid level to a protein of 
the GT1 class of transferases and/or has a tertiary structure similar to the 
structure of the glycosyltransferase from Amycolatopsis orientals A82846. 
Preferably the scaffold has transferase activity, and more preferably 
glycosyltransferase activity. It is preferred that SDRs are inserted into the 
protein scaffold at one or more positions from the group of positions that 
correspond structurally or by amino acid sequence homology to the regions 58- 
74, 130-138, 185-193, 228-236 and 314-323 in glycosyltransferase from 
Amycolatopsis orientalis A82846, and more preferably at one or more positions 
from the group of positions that correspond structurally or by amino acid 
sequence homology to the regions 61-71, 230-234 and 316-321 (numbering of 
amino acids according to SEQ ID NO:45). It is preferred that the 
glycosyltransferase from Amycolatopsis orientalis A82846 or a derivative or 
homologue thereof is used as the scaffold. 
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It is further preferred that the engineered enzymes have oxidoreductase activity. 
A particularly suited protein scaffold for this variant is a monooxygenase f a 
dioxygenase or a alcohol dehydrogenase, or a derivative thereof. The tertiary 
structure of the protein scaffold can be of any type. 

In a first embodiment the protein scaffold has a tertiary structure similar to the 
structure of the 2,3-diphydroxybiphenyl dioxygenase from Pseudomonas sp. or 
has at least 70% identity on the amino acid level to a protein that has a tertiary 
structure similar to the structure of the 2,3-diphydroxybipheny! dioxygenase 

* 

from Pseudomonas sp. Preferably, the scaffold has dioxygenase activity. It is 
preferred that SDRs are inserted into the protein scaffold at one or more 
positions from the group of positions that correspond structurally or by amino 
acid sequence homology to the regions 172-185, 198-206, 231-237, 250-259 
and 282-287 in 2,3-diphydroxybiphenyl dioxygenase from Pseudomonas sp., and 
more preferably at one or more positions from the group of positions that 
correspond structurally or by amino acid sequence homology to the regions 175- 
182, 200-204, 252-257 and 284-287 (numbering of amino acids according to 
SEQ ID NO:46). It is preferred that the 2,3-diphydroxybiphenyl dioxygenase 
from Pseudomonas sp or a derivative or homologue thereof is used as the 
scaffold. 

In a further embodiment the protein scaffold has a tertiary structure similar to 
the structure of the catechol dioxygenase from Acinetobacter sp. or has at least 
70% identity on the amino acid level to a protein that has a tertiary structure 
similar to the structure of the catechol dioxygenase from Acinetobacter sp.. 
Preferably, the scaffold has dioxygenase activity, and more preferably catechol 
dioxygenase activity. It is preferred that SDRs are inserted into the protein 
scaffold at one or more positions from the group of positions that correspond 
structurally or by amino acid sequence homology to the regions 66-72, 105-112, 
156-171 and 198-207 in catechol dioxygenase from Acinetobacter sp., and more 
preferably at one or more positions from the group of positions that correspond 
structurally or by amino acid sequence homology to the regions 107-110, 161- 
171 and 201-205 (numbering of amino acids according to SEQ ID NO:47). It is 
preferred that the catechol dioxygenase from Acinetobacter sp or a derivative or 
homologue thereof is used as the scaffold. 
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In a further embodiment the protein scaffold has a tertiary structure similar to 
the structure of the camphor-5-monooxygenase from Pseudomonas putida or 
has at least 70% identity on the amino acid level to a protein that has a tertiary 
structure similar to the structure of the camphor-5-monooxygenase from 
Pseudomonas putida. Preferably, the scaffold has monooxygenase activity, and 
more preferably camphor monooxygenase activity. It is preferred that SDRs are 
inserted into the protein scaffold at one or more positions from the group of 
positions that correspond structurally or by amino acid sequence homology to 
the regions 26-31, 57-63, 84-98, 182-191, 242-256, 292-299 and 392-399 in 
camphor-5-monooxygenase from Pseudomonas putida, and more preferably at 
one or more positions from the group of positions that correspond structurally or 
by amino acid sequence homology to the regions 85-96, 183-188, 244-253, 293- 
298 and 393-398 (numbering of amino acids according to SEQ ID NO:48). It is 
preferred that the camphor-5-monooxygenase from Pseudomonas putida or a 
derivative or homologue thereof is used as the scaffold. 

In a further embodiment the protein scaffold has a tertiary structure similar to 
the structure of the alcohol dehydrogenase from Equus callabus or has at least 
70% identity on the amino acid level to a protein that has a tertiary structure 
similar to the structure of the alcohol dehydrogenase from Equus callabus. 
Preferably, the scaffold has alcohol dehydrogenase activity. It is preferred that 
SDRs are inserted into the protein scaffold at one or more positions from the 
group of positions that correspond structurally or by amino acid sequence 
homology to the regions 49-63, 111-112, 294-301 and 361-369 in alcohol 
dehydrogenase from Equus callabus, and more preferably at one or more 
positions from the group of positions that correspond structurally or by amino 
acid sequence homology to the regions 51-61 and 295-299 (numbering of amino 
acids according to SEQ ID NO:49). It is preferred that the alcohol dehydrogenase 
from Equus callabus or a derivative or homologue thereof is used as the scaffold. 

It is further preferred that the engineered enzymes have lyase activity. A 
particularly suited protein scaffold for this variant is a oxoacid lyase or is a 
derivative thereof. Particularly preferred protein scaffolds for this variant are 
aldolases or synthases, or are derived thereof. The tertiary structure of the 
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protein scaffold can be of any type, but a (beta/alpha)8 barrel structure is 
preferred. 

In a first embodiment the protein scaffold has a tertiary structure similar to the 
structure of the N-acetyl-d-neuramic acid aldolase from Escherichia coli or has at 
least 70% identity on the amino acid level to a protein that has a tertiary 
structure similar to the structure of the N-acetyl-d-neuramic acid aldolase from 
Escherichia coli. Preferably, the scaffold has aldolase activity. It is preferred that 
SDRs are inserted into the protein scaffold at one or more positions from the 
group of positions that correspond structurally or by amino acid sequence 
homology to the regions 45-55, 78-87, 105-113, 137-146, 164-171, 187-193, 
205-210, 244-255 and 269-276 in N-acetyl-d-neuramic acid aldolase from 
Escherichia coli, and more preferably at one or more positions from the group of 
positions that correspond structurally or by amino acid sequence homology to 
the regions 45-52, 138-144, 189-192, 247-253 and 271-275 (numbering of 
amino acids according to SEQ ID NO: 50). It is preferred that the N-acetyl-d- 
neuramic acid aldolase from Escherichia coli or a derivative or homologue thereof 
is used as the scaffold. 

In a further embodiment the protein scaffold has a tertiary structure similar to 
the structure of the tryptophan synthase from Salmonella typhimurium or has at 
least 70% identity on the amino acid level to a protein that has a tertiary 
structure similar to the structure of the tryptophan synthase from Salmonella 
typhimurium. Preferably, the scaffold has synthase activity. It is preferred that 
SDRs are inserted into the protein scaffold at one or more positions from the 
group of positions that correspond structurally or by amino acid sequence 
homology to the regions 56-63, 127-134, 154-161, 175-193, 209-216 and 230- 
240 in tryptophan synthase from Salmonella typhimurium, and more preferably 
at one or more positions from the group of positions that correspond structurally 
or by amino acid sequence homology to the regions 57-62, 155-160, 178-190 
and 210-215 (numbering of amino acids according to SEQ ID NO:51). It is 
preferred that the tryptophan synthase from Salmonella typhimurium or a 
derivative or homologue thereof is used as the scaffold. 
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It is further preferred that the engineered enzymes have isomerase activity. A 
particularly suited protein scaffold for this variant is a converting aldose or a 
converting ketose, or is a derivative thereof. 

In a first embodiment, the protein scaffold has a tertiary structure similar to the 
structure of the xylose isomerase from Actinoplanes missouriensis or has at least 
70% identity on the amino acid level to a protein that has a tertiary structure 
similar to the structure of the xylose isomerase from Actinoplanes missouriensis. 
It is preferred that SDRs are inserted into the protein scaffold at one or more 
positions from the group of positions that correspond structurally or by amino 
acid sequence homology to the regions 18-31, 92-103, 136-147, 178-188 and 
250-257 in xylose isomerase from Actinoplanes missouriensis, and more 
preferably at one or more positions from the group of positions that correspond 
structurally or by amino acid sequence homology to the regions 20-27, 92-99 
and 180-186 (numbering of amino acids according to SEQ ID NO:52). It is 
preferred that the xylose isomerase from Actinoplanes missouriensis or a 
derivative or homologue thereof is used as the scaffold. 

It is further preferred that the engineered enzymes have iigase activity. A 
particularly suited protein scaffold for this variant is a DNA Iigase, or is a 
derivative thereof. 

In a first embodiment, the protein scaffold has a tertiary structure similar to the 
structure of the DNA Iigase from Bacteriophage T7 or has at least 70% identity 
on the amino acid level to a protein that has a tertiary structure similar to the 
structure of the DNA-iigase from Bacteriophage T7. It is preferred that SDRs are 
inserted into the pnotein scaffold at one or more positions from the group of 
positions that correspond structurally or by amino acid sequence homology to 
the regions 52-60, 94-108, 119-131, 241-248, 255-263 and 302-318 in DNA 
Iigase from Bacteriophage T7, and more preferably at one or more positions from 
the group of positions that correspond structurally or by amino acid sequence 
homology to the regions 96-106, 121-129, 256-262 and 304-316 (numbering of 
amino acids according to SEQ ID NO: 53). It is preferred that the DNA Iigase from 
Bacteriophage T7 or a derivative or homologue thereof is used as the scaffold. 



WO 2004/113521 



PCT/EP2004/051172 



42 

A second aspect of the invention is directed to the application of engineered 
enzymes with specificities for therapeutic, research, diagnostic, nutritional, 
personal care or industrial purposes. The application comprises at least the 
following steps: 

(a) identification of a target peptide substrate whose hydrolysis has a positive 
effect in connection with the intended purpose, such as curing a disease, 
diagnosing a disease, processing of ingredients for human or animal 
nutrition, or other technical processes; 

(b) provision of an engineered enzyme, the enzyme being specific for the 
target peptide identified in step (a); and 

(c) use of the enzyme as provided in step (b) for the intended purpose. 

In a first variant of this aspect of the invention, the engineered enzyme is used 
as a therapeutic means to inactivate a disease-related target substrate. This 
application comprises at least the following steps: 

(a) identification of a target substrate whose function is connected to a 
disease and whose inactivation has a positive effect in connection with the 
disease, and determination of a target site within the target substrate 
characterized by the fact that modification at the target site leads to the 
inactivation of the target substrate; 

(b) provision of an engineered enzyme, the enzyme being specific for the 
target site identified in step (a); and 

(c) use of the enzyme for the inactivation of the target substrate inside or 
outside the human body. 

In a preferred embodiment the scaffold of the engineered enzyme provided in 
step (c) is of human origin in order to avoid or reduce immunogenicity or 
allergenic effects associated with the application of the enzyme in the human 
body. In a more preferred embodiment of this variant, the scaffold is of a 
human protease and the modification is hydrolysis of a target site in a protein 
target. Preferably, the hydrolysis leads to the activation or inactivation of the 
peptide or protein target. Potential peptide or protein targets include: cytokines, 
growth factors, peptide hormones, interleukins, interferons, enzymes from the 
coagulation cascade, serpins, immunoglobulins, soluble or membrane-bound 
receptors, cellular or viral surface proteins, peptide drugs, protein drugs. 
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A particularly preferred embodiment is based on the finding that the engineered 
enzyme is capable for the cleavage of human tumor nekrose factor-alpha (TNF- 
a). The engineered enzymes or the fusion protein can thus be used for preparing 
medicaments for the treatment of inflammatory diseases (as well as other 
diseases connected with TNF-a). Preferably, said engineered enzyme or said 
fusion protein is capable of specifically inactivating human tumor nekrose factor- 
alpha (hTNF-a), more preferably said engineered enzyme or said fusion protein is 
capable of hydrolysing the peptide bond between positions 31/32, 32/33, 44/45, 
87/88, 128/129 and/or 141/142 (most preferred between positions 31/32 and 
32/33) in hTNF-a (SEQ ID NO:96). 

In further embodiment, the target substrate is a pro-drug which is activated by 
the engineered enzyme. In a particular embodiment of this variant, the 
engineered enzyme has proteolytic activity and the target substrate is a protein 
target which is proteolytically activated. Examples of such pro-drugs are pro- 
proteins such as the inactivated forms of coagulations factors. In another 
particular variant, the engineered enzyme is an oxidoreductase and the target 
substrate is a chemical that can be activated by oxidation. 

In a second variant of this aspect of the invention, the engineered enzyme is 
used as a technical means in order to catalyze an industrially or nutritionally 
relevant reaction with defined specificity. In a particular embodiment of this 
variant the engineered enzyme has proteolytic activity, the catalyzed reaction is 
a proteolytic processing, and the engineered enzyme specifically hydrolyses one 
or more industrially or nutrionally relevant protein substrates. In a preferred 
embodiment of this variant the engineered enzyme hydrolyses one or more 
industrially or nutrionally relevant protein substrates at specific sites, thereby 
leading to industrially or nutrionally desired product properties such as texture, 
taste or precipitation characteristics. In a further particular embodiment of this 
variant, the engineered enzyme catalyzes the hydrolysis of glycosidic bonds 
(glycosidase or glycosylases activity). Then, preferably, the catalyzed reaction is 
a polysaccharide processing, and the engineered enzyme specifically hydrolyses 
one or more industrially, technically or nutrionally relevant polysaccharide 
substrates. In a further particular embodiment of this variant, the engineered 
enzyme catalyzes the hydrolysis of triglyceride esters or lipids (lipase activity). 
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Then, preferably, the catalyzed reaction is a lipid processing step, and the 
engineered enzyme specifically hydrolyses one or more industrially, technically or 
nutrionally relevant lipid substrates. In a further particular variant of this 
embodiment, the engineered enzyme catalyzes the oxidation or reduction of 
substrates (oxidoreductase activity). Then, preferably, the engineered enzyme 
specifically oxidizes or reduces one or more industrially, technically or nutrionally 
relevant chemical substrates. 

A third aspect of the invention is directed to a method for generating engineered 
enzymes with specificities that are qualitatively and/or quantitatively novel in 
combination with the protein scaffold. The inventive method comprises at least 
the following steps: 

(a) providing a protein scaffold 'capable to catalyze at least one chemical 
reaction on at least one target substrate, 

(b) generating a library of engineered enzymes or isolated engineered 
enzymes by combining the protein scaffold from step (a) with one or more 
fully or partially random peptide sequences at sites in the protein scaffold that 
enable the resulting engineered enzyme to discriminate between at least one 
target substrate and one or more different substrates and 

(c) selecting out of the library of engineered enzymes generated in step (b) 
one or more enzymes that have defined specificities towards at least one 
target substrate. 

In a first variant of this aspect of the invention, the inventive method comprises 
at least the following steps: 

(a) providing a protein scaffold capable to catalyze at least one chemical 
reaction on at least one target substrate, 

(b) generating a library of engineered enzymes or isolated engineered 
enzymes by inserting into the protein scaffold from step (a) one or more fully 
or partially random peptide sequences at sites in the protein scaffold that 
enable the resulting engineered enzyme to discriminate between at least one 
target substrate and one or more different substrates and 

(c) selecting out of the library of engineered enzymes generated in step (b) 
one or more enzymes that have defined specificities towards at least one 
target substrate. 
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Preferably, the positions at which the one or more fully or partially random 
peptide sequences are combined with or inserted into the protein scaffold are 
identified prior to the combination or insertion. 

The number of insertions or other combinations of fully or partially random 
peptide sequences as well as their length may vary over a wide range. The 
number is at least one, preferably more than one, more preferably between two 
and eleven, most preferably between two and six. The length of such fully or 
partially random peptide sequences is usually less than 50 amino acid residues. 
Preferably, the length is between one and 15 amino acid residues, more 
preferably between one and six amino acid residues. Alternatively, the length is 
between two and 20 amino acid residues, preferably between two and ten amino 
acid residues, more preferably between three and eight amino acid residues. 

Preferably such insertions or other combinations are performed on the DNA level, 
using polynucleotides encoding such protein scaffolds and polynucleotides or 
oligonucleotides encoding such fully or partially random peptide sequences. 

Optionally, steps (a) to (c) are repeated cyclically, whereby enzymes selected in 
step (c) serve as the protein scaffold in step (a) of a further cycle, and 
randomized peptide sequences are either inserted or, alternatively, substituted 
for peptide sequences that have been inserted in former cycles. Thereby, the 
number of inserted peptide sequences is either constant or increases over the 
cycles. The cycles are repeated until one or more enzymes with the intended 
specificities are generated. 

Moreover, during or after one or more rounds of steps (a) to (c), the scaffold 
may be mutated at one or more positions in order to make the scaffold more 
acceptable for the combination with SDR sequences, and/or to increase catalytic 
activity at a specific pH and temperature, and/or to change the glycosylation 
pattern, and/or to decrease sensitivity towards enzyme inhibitors, and/or to 
change enzyme stability. 

In a second variant of this aspect of the invention, the inventive method 
comprises at least the following steps: 
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(a) providing a first protein scaffold fragment, 

(b) connecting said protein scaffold fragment via a peptide linkage with a first 
SDR, and optionally 

(c) connecting the product of step (b) via a peptide linkage with a further SDR 
peptide or with a further protein scaffold fragment, and optionally 

(d) repeating step (c) for as many cycles as necessary in order to generate a 
sufficiently specific enzyme, and 

(e) selecting out of the population generated in steps (a) - (d) one or more 
enzymes that have the desired specificities toward the one or more target 
substrates. 

Protein scaffold fragment means a part of the sequence of a protein scaffold. A 
protein scaffold is comprised of at least two protein scaffold fragments. 

In a third variant of this aspect of the invention, the protein scaffold, the SDRs 
and the engineered enzyme are encoded by a DNA sequence and an expression 
system is used in order to produce the protein. In an alternative variant, the 
protein scaffold, the SDRs and/or the engineered enzyme are chemically 
synthesized from peptide building blocks. 

In a fourth variant of this aspect of the invention, the inventive method 
comprises at least the following steps: 

(a) providing a polynucleotide encoding a protein scaffold capable of catalyzing 
one or more chemical reactions on one or more target substrates; 

(b) combining one or more fully or partially random oligonucleotide sequence 
with the polynucleotide encoding the protein scaffold, the fully or partially 
random oligonucleotide sequences being located at sites in the polynucleotide 
that enable the encoded engineered enzyme to discriminate between the one or 
more target substrates and one or more other substrates; and 

* 

(c) selecting out of the population generated in step (b) one or more 
polynucleotides that encode enzymes that have the defined specificities toward 
the one or more target substrates. 

Any enzyme can serve as the protein scaffold in step (a). It can be a naturally 
occurring enzyme, a variant or a truncated derivate therefore, or an engineered 
enzyme. For human therapeutic use, the protein scaffold is preferably a 
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mammalian enzyme, and more preferably a human enzyme. In that aspect, the 
invention is directed to a method for the generation of essentially mammalian, 
especially of essentially human enzymes with specificities that are different from 
specificities of any enzyme encoded in mammalian genomes or in the human 
genome, respectively. 

According to the invention, the protein scaffold provided in step (a) of this aspect 
requires to be capable of catalyzing one or more chemical reactions on a target 
substrate. Therefore, a protein scaffold is selected from the group of potential 
protein scaffolds by its activity on the target substrate. 

In a preferred variant of this aspect of the invention, a protein scaffold with 
hydrolase activity is used. Preferably, a protein scaffold with proteolytic activity 
is used, and more preferably, a protease with very low specificity having basic 
activity on the target substrate is used as the protein scaffold. Examples of 
proteases from different structural classes with low substrate specificity are 
Papain, Trypsin, Chymotrypsin, Subtiiisin, SET (trypsin-like serine protease from 
Streptomyces erythraeus), Elastase, Cathepsin G or Chymase. Before being 
employed as the protein scaffold, the amino acid sequence of the protease may 
be modified in order to change protein properties other than specificity, e.g 
catalytic activity, stability, inhibitor sensitivity, or expression yield, essentially as 
described in WO 92/18645, or in order to change specificity, essentially as 
described in EP 02020576.3 and PCT/EP03/04864. 

Another option for a feasible protein scaffold are lipases. Hepatic lipase, 
lipoprotein lipase and pancreatic lipase belong to the "lipoprotein lipase 
superfamily", which in turn is an example of the GX-class of lipases (M. Fischer, 
J. Pleiss (2003), NucL Acid. Res., 31, 319-321). The substrate specificity of 
lipases can be characterized by their relative activity towards triglycerol esters of 
fatty acids and phospholipids, bearing a charged head group. Alternatively, other 
hydrolases such as esterases, glycosylases, amidases, or nitrilases may be used 
as scaffolds. 

Transferases are also feasible protein scaffolds. Glycoslytransferases are involved 
in many biological synthesis involving a variety of donors and acceptors. 
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Alternatively, the protein scaffold may have ligase, lyase, oxidoreductase, or 
isomerase activity. 

In a first embodiment, the one or more fully or partially random peptide 
sequences are inserted at specific sites in the protein scaffold. These insertion 
sites are characterized by the fact that the inserted peptide sequences can act as 
discriminators between different substrates, i.e. as Specificity Determining 
Regions or SDRs. Such insertion sites can be identified by several approaches. 
Preferably, insertion sites are identified by analysis of the three-dimensional 
structure of the protein scaffolds, by comparative analysis of the primary 
sequences of the protein scaffold with other enzymes having different 
quantitative specificities, or experimentally by techniques such as alanine 
scanning, random mutagenesis, or random deletion, or by any combination 
thereof. 

A first approach to identify insertion sites for SDRs bases on the three- 
dimensional structure of the protein scaffold as it can be obtained by x-ray 
crystallography or by nuclear magnetic resonance studies. Structural alignment 
of the protein scaffold in comparison with other enzymes of the same structural 
class but having different quantitative specificities reveals regions of high 
structural similarity and regions with low structural similarity. Such an analysis 
can for example be done using public software such as Swiss PDB viewer (Guex, 
N. and Peitsch, M.C. (1997) Electrophoresis 18, 2714-2723). Regions of low 
structural similarity are preferred SDR insertion sites. 

In a second approach to identify insertion sites for SDRs, three-dimensional 
structures of the scaffold protein in complex with competitive inhibitors or 
substrate analogs are analysed. It is assumed that the binding site of a 
competitive inhibitor significantly overlaps with the binding site of the substrate. 
In that case, atoms of the protein that are within a certain distance of atoms of 
the inhibitor are likely to be in a similar distance to the substrate as well. 
Choosing a short distance, e.g. < 5 A, will result in an ensemble of protein atoms 
that are in close contact with the substrate. These residues would constitute the 
first shell contacts and are therefore preferred insertion sites for SDRs. Once first 
shell contacts have been identified, second shell contacts can be found by 
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repeating the distance analysis starting from first shell atoms. In yet another 
alternative of the invention the distance analysis described above is performed 
starting from the active site residues. 

In third approach to identify insertion sites for SDRs, the primary sequence of 
the scaffold protein is aligned with other enzymes of the same structural class 
. but having different quantitative specificities using an alignment algorithm. 
Examples of such alignment algorithms are published (Altschul, S.F., Gish, W., 
Miller, W., Myers, E.W. & Lipman, DJ. (1990) J. Mol. Biol. 215:403-410; 
"Statistical methods in Bioinformatics: an introduction" by Ewens, W. & Grant, 
G.R. 2001, Springer, New York). Such an alignment may reveal conserved and 
non-conserved regions with varying sequence homology, and, in particular, 
additional sequence elements in one or more enzymes compared to the scaffold 
protein. Conserved regions of are more likely to contribute to phenotypes shared 
among the different proteins, e.g. stabilizing the three-dimensional fold. Non- 
conserved regions and, in particular, additional sequences in enzymes with 
quantitatively higher specificity (Turner, R. et al. (2002) J. Biol. Chem., 277, 
33068-33074) are preferred insertion sites for SDRs. 

For proteases currently five families are known, namely aspartic-, cysteine-, 
serine-, metallo- and threonine proteases. Each family includes groups of 
proteases that share a similar fold. Crystallographic structures of members of 
these groups have been solved and are accessible through public databases, e.g. 
the Brookhaven protein database (H.M. Berman et al. Nucleic Acids Research, 28 
pp. 235-242 (2000)). Such databases also include structural homologs in other 
enzyme classes and nonenzymatically active proteins of each class. Several tools 
are available to search public databases for structural homologues: SCOP - a 
structural classification of proteins database for the investigation of sequences 
and structures. (Murzin A. G. et al. (1995) J. MoL Biol. 247, 536-540); CATH - 
Class, Architecture, Topology and Homologous superfamily: a hierarchical 
classification of protein domain structures (Orengo et al. (1997) Structure 5(8) 
1093-1108); FSSP - Fold classification based on structure-structure alignment of 
proteins (Holm and Sander (1998) NucL Acids Res, 26 316-319); or VAST - 
Vector alignment search tool (GIbrat, Madej and Bryant (1996) Current Opinion 
in Structural Biology 6, 377-385). 
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In the above described approaches, members of structural classes are compared 
in order to identify insertion sites for SDRs. 

In a preferred variant of these approaches serine proteases of the structural 
class SI are compared with each other. Trypsin represents a member with low 
substrate specificity, as it requires only an arginine or lysine residue at the Pi 
position. On the other hand, thrombin, tissue-type plasminogen activator or 
enterokinase all have a high specificity towards their substrate sequences, i.e. 
(iyi/V/F)XPR^NA, CPGR^WGG and DDDK^, respectively (Perona, J. & Craik, C. 
(1997) J. Biol. Chem., 272, 29987-29990; Perona, j. & Craik, C (1995) Protein 
Science, 4, 337-360). An alignment of the amino acid sequences of these 
proteases is described in example 1 (Figure 2) along with the identification of 
SDRs. 

A further example within the family of serine proteases is given by members of 
the structural class S8 (subtilisin fold). Subtilisin is the type protease for this 
class and represents an unspecific protease (Ottesen,M. & Svendsen,A. (1998) 
Methods Enzymol. 19, 199-215). Furin, PCI and PCS are proteases of the same 
structural class involved in the processing of propeptides and have a high 
substrate specificity (Seidah, N. & Chretien, M. (1997) Curr. Opin. Biotech., 8: 
602-607; Bergeron, F. et al. (2000) J. Mot. Endocrin., 24:1-22). In a preferred 
variant of the approach alignments of the primary amino acids sequences (Figure 
4) are used to identify eleven sequence stretches longer than three amino acids 
which specific proteases have in addition compared to subtilisin and are therefore 
potential specificity determining regions. In a further variant of the approach 
information from the three-dimensional structure of subtilisin can be used in 
order to further narrow down the selection (Figure 3). Out of the eleven inserted 
sequence stretches, three are especially close to the active site residues, namely 
stretch number 7, 8 and 11 which are insertions in PCS, PCI and all three 
specific proteases, respectively (Figure 3). In a preferred variant, dne or several 
amino acid stretches of variable length and composition can be inserted into the 
subtilisin sequence at one or several of the eleven positions. In a more preferred 
variant of the approach the insertion is performed at regions 7, 8 or 11 or any 
combination thereof. In another preferred variant of the approach protease 
scaffolds other than subtilisin from the structural class S8 are used. 
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In a further preferred variant of this approach, aspartic acid proteases of the 
structural class Al are analyzed (Rawiings, N.D. & Barrett, A J. (1995). Methods 
EnzymoL 248, 105-120; Chitpinityol, S. & Crabbe, MJ. (1998), Food Chemistry, 
61, 395-418). Examples for the Al structural class of aspartic proteases are 
pepsin with a low as well as beta-secretase (Gruninger-Leitch, F., et al. (2002) J. 
Biol. Chem. 277, 4687-4693) and renin (Wang, W. & Liang, TC. (1994) 
Biochemistry, 33, 14636-14641) with relatively high substrate specificities. 
Retroviral proteases also belong to this class, although the active enzyme is a 
dimer of two identical subunits. The viral proteases are essential for the correct 
processing of the polyprotein precursor to generate functional proteins which 
requires a high substrate specificity in each case (Wu, J. et al. (1998) 
Biochemistry, 37, 4518-4526; Pettit, S. et al. (1991) J. Biol. Chem., 266, 14539- 
14547). Pepsin is the type protease for this class and represents an unspecific 
protease (Kageyama, T. (2002) Cell. Mol. Life Set. 59, 288-306). B-secretase and 
Cathepsin D (Aguilar, C. F. et al. (1995) Adv. Exp. Med. Biol. 362, 155-166) are 
proteases of the same structural class and have a high substrate specificity. In a 
preferred variant of the approach alignments of the primary amino acids 
sequences (Figure 6) are used to identify six sequence stretches longer than 
three amino acids which are inserted in the specific proteases compared to 
pepsin and are therefore potential specificity determining regions. In a further 
variant of the approach information from the three-dimensional structure of b- 
secretase can be used in order to further narrow down the selection. Out of the 
six inserted sequence stretches, three are especially close to the active site 
residues, namely stretch number 1, 3 and 4 which are insertions in cathepsin D 
and beta-secretase, respectively (Figure 5). In a preferred variant of the 
approach, one or several amino acid stretches of variable length and composition 
can be inserted into the pepsin sequence at one or several of the six positions. In 
a more preferred embodiment of the invention the insertion is performed at the 
positions 1, 3 or 4 or any combination thereof. In another preferred embodiment 
of the invention protease scaffolds other than pepsin are used. 

There are cases where a certain structural class does not include known 
members of low and high specificity. This is exemplified by the C14 class of 
caspases which belong to the cysteine protease family (Rawiings, N.D. & Barrett, 
A J. (1994) Methods EnzymoL 244, 461-486 ) and which ail show high specificity 



WO 2004/113521 



PCT/EP2004/051 1 72 



52 

for P 4 to ?! positions. For example, caspase-1, caspase-3 and caspase-9 
recognize the sequences YVAD^, DEVD^ or LEHD^, respectively. Identification 
of the regions that differ between the caspases will include the regions 
responsible for the differences in substrate specificity (Figures 7 and 8). 

Finally, non-enzymatic proteins of the same fold as the enzyme scaffold may also 
contribute to the identification of insertion sites for SDRs. For example, 
haptoglobin (Arcoleo, J. & Greer, J.; (1982) J. BioL Chem. 257, 10063-10068) 
and azurocidin (Almeida, R. et al. (1991) Biochem. Biophys. Res. Commun. 177, 
688-695) share the same chymotrypsin-like fold with all SI proteases. Due to 
substitutions in the active site residues these proteins do not posses any 
proteolytic function, yet they show high homology with active proteases. 
Differences between these proteins and specific proteases include regions that 
can serve as insertion sites for SDRs. 

In a fourth approach, insertion sites for SDRs are identified experimentally by 
techniques such as alanine scanning, random mutagenesis, random insertion or 
random deletion. In contrast to the approach disclosed above, this approach 
does not require detailed knowledge about the three-dimensional structure of the 
scaffold protein. In one preferred variant of this approach, random mutagenesis 
of enzymes with relatively high specificity from the same structural class as the 
protein scaffold and screening for loss or change of specificity can be used to 
identify insertion sites for SDRs in the protein scaffold. 

Random mutagenesis, alanine scanning, random insertion or random deletion are 
all done on the level of the polynucleotides encoding the enzymes. There are a 
variety of protocols known in the literature (e.g. Sambrook, J.F; Fritsch, E.F.; 
Maniatis,T.; Cold Spring Harbor Laboratory Press, Second Edition, 1989, New 
York). For example, random mutagenesis can be achieved by the use of a 
polymerase as described in patent WO 9218645. According to this patent, the 
one or more genes encoding the one or more proteases are amplified by use of a 
DNA polymerase with a high error rate or under conditions that increase the rate 
of misincorporations. For example the method of Cadwell and Joyce can be 
employed (Cadwell, R.C. and Joyce, G.F., PCR methods. Appl. 2 (1992) 28-33). 
Other methods of random mutagenesis such as, but not limited to, the use of 
mutator stains, chemical mutagens or UV-radiation can be employed as well. 
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Alternatively, oligonucleotides can be used for mutagenesis that substitute 
randomly distributed amino acid residues with an alanine. This method is 
generally referred to as alanine scanning mutagenesis (Fersht, A.R. Biochemistry 
(1989) 8031-8036). As a further alternative, modifications of the alanine 
scanning mutagenesis such as binominal mutagenesis (Gregoret, LM. and Sauer, 
R.T. PNAS (1993) 4246-4250) or combinatorial alanine scanning (Weiss et al., 
PNAS (2000) 8950-8954) can be employed. 

In order to express engineered enzymes, the DNA encoding such engineered 
proteins is ligated into a suitable expression vector by standard molecular cloning 
techniques (e.g. Sambrook, J.F; Fritsch, E.F.; Maniatis, T; Cold Spring Harbor 
Laboratory Press, Second Edition, 1989, New York). The vector is introduced in a 
suitable expression host cell, which expresses the corresponding engineered 
enzyme variant. Particularly suitable expression hosts are bacterial expression 
hosts such as Escherichia coli or Bacillus subtiiis, or yeast expression hosts such 
as Saccharomyces cerevisae or Pichia pastoris, or mammalian expression hosts 
such as Chinese Hamster Ovary (CHO) or Baby Hamster Kidney (BHK) cell lines, 
or viral expression systems such as bacteriophages like M13 or Lambda, or 
viruses such as the Baculovirus expression system. As a further alternative, 
systems for in vitro protein expression can be used. Typically, the DNA is ligated 
into an expression vector behind a suitable signal sequence that leads to 
secretion of the enzyme variants into the extracellular space, thereby allowing 
direct detection of protease activity in the cell supernatant. Particularly suitable 
signal sequences for Escherichia coli are HlyA, for Bacillus subtiiis AprE, NprB, 
Mpr, AmyA, AmyE, Blac, SacB, and for S. cerevisiae Barl, Suc2, Mata, InulA, 
Ggplp. Alternatively, the enzyme variants are expressed intracellular^ and the 
substrates are expressed also intracellular^. Preferably, this is done essentially 
as described in patent application WO 0212543, using a fusion peptide substrate 
comprising two auto-fluorescent proteins linked by the substrate amino-acid 
sequence. As a further alternative, after intracellular expression of the enzyme 
variants, or secretion into the periplasmatic space using signal sequences such 
as DsbA, PhoA, PelB, OmpA, OmpT or gill for Escherichia coli, a permeabilisation 
or lysis step releases the enzyme variants into the supernatant. The destruction 
of the membrane barrier can be forced by the use of mechanical means such as 
ultrasonic, French press, or the use of membrane-digesting enzymes such as 
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lysozyme. As another, further alternative, the genes encoding the enzyme 
variants are expressed cell-free by the use of a suitable cell-free expression 
system. For example, the S30 extract from Escherichia coli cells is used for this 
purpose as described by Lesly et al. (Methods in Molecular Biology 37 (1995) 
265-278). 

The ensemble of gene variants generated and expressed by any of the above 
methods are analyzed with respect to their affinity, substrate specificity or 
activity by appropriate assay and screening methods as described in detail for 
example in patent application PCT/EP03/04864. Genes from catalytically active 
variants having reduced specificity in comparison to the original enzyme are 
analyzed by sequencing. Sites at which mutations and/or insertions and/or 
deletions occurred are preferred insertion sites at which SDRs can be inserted 
site-specifically. 

In a second embodiment, the one or more fully or partially random peptide 
sequences are inserted at random sites in the protein scaffold. This modification 
is usually done on the polynucleotide level, i.e. by inserting nucleotide sequences 
into the gene that encodes the protein scaffold. Several methods are available 
that enable the random insertion of nucleotide sequences. Systems that can be 
used for random insertion are for example ligation based systems (Murakami et 
al. Nature Biotechnology 20 (2002) 76-81), systems based on DNA 
polymerisation and transposon based systems (e.g. GPS-M™ mutagenesis 
system, NEB Biolabs; MGS™ mutation generation system, Finnzymes). The 
transposon-based methods employ a transposase-mediated insertion of a 
selectable marker gene that contains at its termini recognition sequences for the 
transposase as well as two sites for a rare cutting restriction endonuclease. Using 
the latter endonuclease one usually releases the selection marker and after 
religation obtains an insertion. Instead of performing the religation one can 
alternatively insert a fragment that has terminal recognition sequences for one or 
two outside cutting restriction endonuclease as well as a selectable marker. After 
ligation, one releases this fragment using the one or two outside cutting 
endonucleases. After creating blunt ends by standard methods one inserts blunt 
ended random fragments at random positions into the gene. 

In a further preferred embodiment, methods for homologous in-vitro 
recombination are used to combine the mutations introduced by the above 
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mentioned methods to generate enzyme populations. Examples of methods that 
can be applied are the Recombination Chain Reaction (RCR) according to patent 
application WO 0134835, the DNA-Shuffling method according to the patent 
application WO 9522625, the Staggered Extension method according to patent 
WO 9842728, or the Random Priming recombination according to patent 
application W09842728. Furthermore, also methods for non-homologous 
recombination such as the Itchy method can be applied (Ostermeier, M. et al. 
Nature Biotechnology 17 (1999) 1205-1209). 

Upon random insertion of a nucleotide sequence into the protein scaffold one 
obtains a library of different genes encoding enzyme variants. The polynucleotide 
library is subsequently transferred to an appropriate expression vector. Upon 
expression in a suitable host or by use of an in vitro expression system, a library 
of enzymes containing randomly inserted stretches of amino acids is obtained. 

According to step (b) of this third aspect of the invention, one or more fully or 
partially random peptide sequences are inserted into the protein scaffold. The 
actual number of such inserted SDRs is determined by the intended quantitative 
specificity following the relation: the higher the intended specificity is, the more 
SDRs are inserted. Whereas a single SDR enables the generation of moderately 
specific enzymes, two SDRs enable already the generation of significantly specific 
enzymes. However, up to six and more SDRs can be inserted into a protein 
scaffold. A similar relation is valid for the length of the SDRs: the higher the 
intended specificity is, the longer are the SDRs that are to be inserted. SDRs can 
be as short as one to four amino acid residues. They can, however, also be as 
long as 50 amino acid residues. Significant specificity can already be generated 
by the use of SDRs of a length of four to six amino acid residues. 

The peptid sequences that are inserted can be fully or partially random. In this 
context, fully random means that a set of sequences are inserted in parallel that 
includes sequences that differ from each other in each and every position. 
Partially random means that a set of sequences are inserted in parallel that 
includes sequences that differ from each other in at least one position. This 
difference can be either pair-wise or with respect to a single sequence. For 
example, when regarding an insertion of the length of four amino acids, partial 
random could be a set (i) that includes AGGG, GVGG, GGLG, GGGI, or (ii) that 
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includes AGGG, VGGG, LGGG and IGGG. Alternatively, random sequences also 
comprises sequences that differ from each other in length. Randomization of the 
peptide sequences is achieved by randomization of the nucleotide sequences that 
are inserted into the gene at the respective sites. Thereby, randomization can be 
achieved by employing mixtures of nucleobases as monomers during chemical 
synthesis of the oligonucleotides. A particularly preferred mixture of monomers 
for a fully random codon that in addition minimizes the probability of stop codons 
is NN(GTC). Alternatively, random oligonucleotides can be obtained by 
fragmentation of DIMA into short fragments that are inserted into the gene at the 
respective sites. The source of the DNA to be fragmented may be a synthetic 
oligonucleotide but alternatively may originate from cloned genes, cDNAs, or 
genomic DNA. Preferably, the DNA is a gene encoding an enzyme. The 
fragmentation can, for example, be achieved by random endonucleolytic 
digestion of DNA. Preferably, an unspecific endonuclease such as DNAse I (e.g. 
from bovine pancreas) is employed for the endonucleolytic digestion. 

If steps (a) - (c) of the inventive method are repeated cyclically, there are 
different alternatives for obtaining random peptide sequences that are inserted in 
consecutive rounds. Preferably, SDRs that were identified in one round as leading 
to increased specificity of enzyme are used as templates for the random peptide 
sequences that are inserted in the following round. 

In a preferred alternative, the sequences selected in one round are analysed and 
randomized oligonucleotides are generated based on these sequences. This can, 
for example, be achieved by using in addition to the original nucleotide with a 
certain percentage mixtures of the other three nucleotides monomers at each 
position in the oligonucleotide synthesis. If, for example, in a first round an SDRs 
is identified that has the amino acid sequence ARLT, e.g. encoded by the 
nucleotide sequence GCG CGC CTT ACC, a random peptide sequence inserted in 
this SDR site could be encoded by an oligonucleotide with 70% G, 10% A, 10% T 
and 10% C at the first position, 70% C, 10% G, 10% T and 10% A at the second 
position, etc. This leads at each position approximately in 1 of 3 cases to the 
template amino acid and in 2 of 3 cases to another amino acid. 
In another preferred alternative, the sequences selected in one round are 
analyzed and a consensus library is generated based on these sequences. This 
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can, for example, be achieved by using defined mixtures of nucleotides at each 
position in the oligonucleotide synthesis in a way that leads to mixtures of the 
amino acid residues that were identified at each position of the SDR selected in 
the previous round. If, for example, in a first round two SDRs are identified that 
have the amino acid sequences ARLT and VPGS, a consensus library inserted in 
this SDR site in the following round could be encoded by an oligonucleotide with 
the sequence G(C/T)G C(G/C)C (G/T)(G/T)G (A/T)CC. This would correspond to 
the random peptide sequence (A/V)(R/P)(L/G/V/W)(T/S), thereby allowing all 
combinations of the amino acid residues identified in the first round, and, due to 
the degeneracy of the genetic code, allowing in addition to a lower degree 
alternative amino acid residues at some positions. 

In another preferred alternative, the sequences selected in one round are, 
without previous analysis, recombined using methods for the in vitro 
recombination of polynucleotides, such as the methods described in WO 
01/34835 (the following also provides details of the eighth and ninth aspect of 
the invention). 

After insertion of the partially or fully random sequences into the gene encoding 
the scaffold protein, and eventually ligation of the resulting gene into a suitable 
expression vector using standard molecular cloning techniques (Sambrook, J.F; 
Fritsch, E.F.; Maniatis,T.; Cold Spring Harbor Laboratory Press, Second Edition, 
ig89, New York), the vector is introduced in a suitable expression host cell which 
expresses the corresponding enzyme variant. Particularly suitable expression 
hosts are bacterial expression hosts such as Escherichia coli or Bacillus subtilis, 
or yeast expression hosts such as Saccharomyces cerevisae or Pichia pastoris, or 
mammalian expression hosts such as Chinese Hamster Ovary (CHO) or Baby 
Hamster Kidney (BHK) cell lines, or viral expression systems such as 
bacteriophages like M13 T7 phage or Lambda, or viruses such as the Baculovirus 
expression system. As a further alternative, systems for in vitro protein 
expression can be used. Typically, the DNA is ligated into an expression vector 
behind a suitable signal sequence that leads to secretion of the enzyme variants 
into the extracellular space, thereby allowing direct detection of enzyme activity 
in the cell supernatant. Particularly suitable signal sequences for Escherichia coli 
are ompA, pelB, HlyA, for Bacillus subtilis AprE, NprB, Mpr, AmyA, AmyE, Blac, 
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SacB, and for S. cerevisiae Barl, Suc2, Mata, InulA, Ggplp. Alternatively, the 
enzyme variants are expressed intracellularly and the substrates are expressed 
also intracellularly. According to protease variants this is done essentially as 
described in patent application WO 0212543, using a fusion peptide substrate 
comprising two auto-fluorescent proteins linked by the substrate amino-acid 
sequence. As a further alternative, after intracellular expression of the enzyme 
variants, or secretion into the periplasmatic space using signal sequences such 
as DsbA, PhoA, PelB, OmpA, OmpTor gill for Escherichia coli, a permeabilisation 
or lysis step releases the enzyme variants into the supernatant. The destruction 
of the membrane barrier can be forced by the use of mechanical means such as 
ultrasonic, French press, or the use of membrane-digesting enzymes such as 
lysozyme. As another, further alternative, the genes encoding the enzyme 
variants are expressed cell-free by the use of a suitable cell-free expression 
system. For example, the S30 extract from Escherichia coli cells is used for this 
purpose as described by Lesly et al. (Methods in Molecular Biology 37 (1995) 
265-278). 

After introduction of the vector into host cells, these cells are screened for the 
expression of enzymes with specificity for the intended target substrate. Such 
screening is typically done by separating the cells from each other, in order to 
enable the correlation of genotype and phenotype, and assaying the activity of 
each cell clone after a growth and expression period. Such separation can for 
example be done by distribution of the cells into the compartments of sample 
carriers, e.g. as described in WO 01/24933. Alternatively, the cells are separated 
by streaking on agar plates, by enclosing in a polymer such as agarose, by filling 
into capillaries, or by similar methods. 

Identification of variants with the intended specificity can be done by different 
approaches. In the case of proteases, preferably assays using peptide substrates 
essentially as described in PCT/EP03/04864 are employed. 

Regardless of the expression format, selection of enzyme variants is done under 
conditions that allow identification of enzymes that recognize and convert the 
target sequence preferably. As a first alternative, enzymes that recognize and 
convert the target sequence preferably are identified by screening for enzymes 
with a high affinity for the target substrate sequence. High affinity corresponds 
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to a low K M which is selected by screening at target substrate concentrations 
substantially below the Km of the first enzyme. Preferably, the substrates that are 
used are linked to one or more fluorophores that enable the detection of the 
modification of the substrate at concentrations below 10 pM, preferably below 1 
MM, more preferably below 100 nM, and most preferably below 10 nM. 

As a second alternative, enzymes that recognize and convert the target substrate 
preferably are identified by employing two or more substrates in the assay and 
screening for activity on these two or more substrates in comparison. Preferably, 
the two or more substrates employed are linked to different marker molecules, 
thereby enabling the detection of the modification of the two or more substrates 
consecutively or in parallel. In the case of proteases, particularly preferably two 
peptide substrates are employed, one peptide substrate having an arbitrarily 
chosen or even partially or fully random amino-acid sequence thereby enabling 
to monitor the activity on an arbitrary substrate, and the other peptide substrate 
having an amino-acid sequence identical to or resembling the intended target 
substrate sequence thereby enabling to monitor the activity on the target 
substrate. Especially preferably, these two peptide substrates are linked to 
fluorescent marker molecules, and the fluorescent properties of the two peptide 
substrates are sufficiently different in order to distinguish both activities when 
measured consecutively or in parallel. For example, a fusion protein comprising a 
First autofluorescent protein, a peptide, and a second autofluorescent protein 
according to patent application WO 0212543 can be used for this purpose. 
Alternatively, fluorophores such as rhodamines are linked chemically to the 
peptide substrates. 

As a third alternative, enzymes that recognize and convert the target substrate 
preferably are identified by employing one or more substrates resembling the 
target substrate together with competing substrates in high excess. Screening 
with respect to activity on the substrates resembling the target substrate is then 
done in the presence of the competing substrates. Enzymes having a specificity 
which corresponds qualitatively to the target specificity, but having only a low 
quantitative specificity are identified as negative samples in such a screen. 
Whereas enzymes having a specificity which corresponds qualitatively and 
quantitatively to the target specificity are identified positively. Preferably, the 
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one or more substrates resembling the target substrate are linked to marker 
molecules, thereby enabling the detection of their modifications, whereas the 
competing substrates do not carry marker molecules. The competing substrates 
have arbitrarily chosen or random amino-acid sequences, thereby acting as 
competitive inhibitors for the hydrolysis of the marker-carrying substrates. For 
example, protein hydrolysates such as Trypton can serve as competing 
substrates for engineered proteolytic enzymes according to the invention. 
As a fourth alternative, enzymes that recognize and convert the target substrate 
preferably are identified and selected by an amplification-coupled or growth- 
coupled selection step. Furthermore, the activity can be measured intracellularily 
and the selection can be done by a cell sorter, such as a fluorescence-activated 
cell sorter. 

As a further alternative, enzymes that recognize and convert the target substrate 
are identified by first selecting enzymes that preferentially bind to the target 
substrate, and secondly selecting out of this subgroup of enzyme variants those 
enzymes that convert the target substrate. Selection for enzymes that 
preferentially bind the target substrate can be either done by selection of binders 
to the target substrate or by counter-selection of enzymes that bind to other 
substrates. Methods for the selection of binders or for the counter-selection of 
non-binders is known in the art. Such methods typically require phenotype- 
genotype coupling which can be solved by using surface display expression 
methods. Such methods include, for example, phage or viral display, cell surface 
display and in vitro display. Phage or viral display typically involves fusion of the 
protein of interest to a viral/phage protein. Cell surface display, i.e. either 
bacterial or eukaryotic cell display, typically involves fusion of the protein of 
interest to a peptide or protein that is located at the cell surface. In in-vitro 
display, the protein is typically made in vitro and linked directly or indirectly to 
the mRIMA encoding the protein (DE 19646372). 

The invention also provides for a composition or pharmaceutical composition 
comprising one or more engineered enzymes according to the first aspect of the 
invention as defined herein before. The composition may optionally comprise an 
acceptable carrier, excipient and/or auxiliary agent. Non-pharamceutical 
compositions as defined herein are research composition, nutritional composition, 
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cleaning composition, desinfection composition, cosmetic composition or 
composition for personal care. Moreover, DNA sequences coding for the 
engineered enzyme as defined herein before and vectors containing said DNA 
sequences are also provided. Finally, transformed host cells (prokaryotic or 
eukaryotic) or transgenic organisms containing such DNA sequences and/or 
vectors, as well as a method utilizing such host cells or transgenic animals for 
producing the engineered enzyme of the first aspect of the invention are also 
contemplated. 

Detailed description of the figures 

Figure 1: Three-dimensional structure of human trypsin I with the active site 
residues shown in w ball-and-stick" representation and with the marked regions 
indicating potential SDR insertion sites. 

Figure 2: Alignment of the primary amino acid sequences of the human 
proteases trypsin I, alpha-thrombin and enteropeptidase all of which belong to 
the structural class SI of the serine protease family. Trypsin represents an 
unspecific protease of this structural class, while alpha-thrombin and 
enteropeptidase are proteases with high substrate specificity. Compared to 
trypsin several regions of insertions of three or more amino acids into the 
primary sequence of a-thrombin and enterokinase are seen. The region marked 
with (-1-) and the region marked with (-3-) are preferred SDR insertion sites. In 
the tertiary structure of alpha-thrombin both regions are in the vicinity of the 
substrate binding site. These regions therefore fullfil two criteria to be selected 
as candidates for SDRs: firstly, they represent insertions in the specific proteases 
compared to the unspecific one and, secondly, they are close to the substrate 
binding site. A representation of the three-dimensional structure is given in 
figure 3. 

Fiaure 3: Three-dimensional structure of subtilisin with the active site residues 
being shown in "ball-and-stick" representation and with the numbered regions 
indicating potential SDR insertion sites. 

Figure 4: Alignment of the primary amino acid sequences of subtilisin E, furin, 
PCI and PCS all of which belong to the structural class S8 of the serine protease 



WO 2004/113521 



PCT/EP2004/051172 



62 

family. Subtilisin E represents an unspecific protease of this structural class, 
while furin, PCI and PCS are proteases with high substrate specificity. Compared 
to subtilisin several regions of insertions of three or more amino acids into the 
primary sequence of furin, PCI and PCS are seen. The regions marked with (-4-), 
(-5-), (-7-), (-9-) and (-11-) are preferred SDR insertion sites. These regions 
stretches fulfill two criteria to be selected as candidates for SDRs: firstly, they 
represent insertions in the specific proteases compared to the unspecific one 
and, secondly, they are close to the active site residues. 

Figure 5: Three-dimensional structure of beta-secretase with the active site 
residues being shown in "ball-and-stick" representation and with the numbered 
regions indicating potential SDR insertion sites. 

Figure 6: Alignment of the primary amino acid sequences of pepsin, b-secretase 
and cathepsin D, all of which belong to the structural class Al of the aspartic 
protease family. Pepsin represents an unspecific protease of this structural class, 
while b-secretase and cathepsin D are proteases with high substrate specificity. 
Compared to pepsin several regions of insertions of three or more amino acids 
into the primary sequence of b-secretase and cathepsin D are seen. The regions 
marked with -1- to -11- correspond to possible SDR combining sites and are 
also marked in Fig. 5. 

Figure 7 : illustrates the three-dimensional structure of caspase 7 with the active 
site residues being shown in "ball-and-stick" representation and with the 
numbered regions indicating potential SDR insertion sites. 

Figure 8 : shows the primary amino acid sequence of caspase 7 as a member of 
the cysteine protease class C14 family (see also SEQ ID NO: 14). 

Figure 9: Schematic representation of method according to the third aspect of 
the invention. 

Figure 10: Western blot analysis of trypsin expression. Supernatant of cell 
cultures expressing variants of trypsin are compared to negative controls. Lane 
1: molecular weight standard; lane 2: negative control; lane 3: supernatant of 
variant a; lane 4: negative control; lane 5: supernatant of variant b. A primary 
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antibody specific to the expressed protein and a secondary antibody for 
generation of the signal were used. 

Figure 11: Time course of the proteolytic cleavage of a target substrate. 
Supernatant of cells containing the vector with the gene for human trypsin and 
that of cells containing the vector without the gene was incubated with the 
peptide substrate described In the text. Cleavage of the peptide results in a 
decreased read out value. Proteolytic activity is confirmed for the positive clone. 

Figure 12: Relative activity of three engineered proteolytic enzymes in 
comparison with human trypsin I on two different peptide substrates. A time 
course of the proteolytic digestion of the two substrates was performed and 
evaluated. Substrate B was used for screening and substrate A is a closely 
related. sequence. Relative activity of the three variants was normalized to the 
activity of human trypsin I. Variant 1 and 2 clearly show increased specificity 
towards the target substrate. Variant 3, on the other hand, serves as a negative 
control with similar activities as the human trypsin I. 

Figure 13: Relative specificities of trypsin and variants of engineered proteolytic 
enzymes with one or two SDRs, respectively. Activity of the proteases was 
determined in the presence and absence of competitor substrate, i.e. peptone at 
a concentration of lOmg/ml. Time courses for the proteolytic cleavage were 
recorded and the time constants k determined. The ratios between the time 
constants with and without competitor were formed and represent a quantitative 
measure for the specificity of the protease. The ratios were normalized to 
trypsin. The specificity of the variant containing two SDRs is 2.5 fold higher than 
that of the variant with SDR2 alone. 

Figure 14: Shows the relative specificities of protease variants in absence and 
presence of competitor substrate. The protease variants containig two inserts 
with different sequences and the non-modified scaffold human trypsin I were 
expressed in a suitable host. Activity of the protease variants was determined as 
the cleavage rate of a peptide with the desired target sequence of TNF-alpha in 
the absence and presence of competitor substrate. Specificity is expressed as the 
ratio of cleavage rates in the presence and absence of competitor. 
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Fioure 15 : The figure shows the reduction of cytotoxicity induced by human TNF- 
alpha when incubating the human TNF-alpha with concentrated supernatant from 
cultures expressing the inventive engineered proteolytic enzymes being specific 
for human TNF-alpha. This indicates the efficacy of the inventive engineered 
proteolytic enzymes. 

Figure 16: T he figure shows the reduction of cytotoxicity induced by human TNF- 
alpha when incubating the human TNF-alpha with different concentrations of 
purified inventive engineered proteolytic enzyme being specific for human TNF- 
alpha. Variant g comprises Seq ID No:72 as SDR1 and Seq ID No:73 as SDR2. 
This indicates the efficacy of the inventive engineered proteolytic enzymes. 

Figure 17 : The figure compares the activity of inventive engineered proteolytic 
enzymes being specific for human TNF-alpha with the activity of human trypsin f 
on two protein substrates: (a) human TNF-alpha; (b) mixture of human serum 
proteins. This indicates the safety of the inventive engineered proteolytic 
enzymes. Variant x corresponds to Seq ID No: 75 comprising the SDRs according 
to Seq ID No. 89 (SDR1) and 95 (SDR2). Variants xi and xii correspond to 
derivatives thereof comprising the same SDR sequences. 

Figure 18: Specific hydrolysis of human VEGF by an engineered proteolytic 
enzyme derived from human trypsin. 

Examples 

In the following examples, materials and methods of the present invention are 
provided including the determination of catalytic properties of enzymes obtained 
by the method. It should be understood that these examples are for illustrative 
purpose only and are not to be construed as limiting this invention in any 
manner. All publications, patents, and patent applications cited herein are hereby 
incorporated by reference in their entirety for all purposes. 

In the experimental examples described below, standard techniques of 
recombinant DNA technology were used that were described in various 
publications, e.g. Sambrook et al. (1989), Molecular Cloning: A Laboratory 
Manual, Cold Spring Harbor Laboratory, or Ausubel et al. (1987), Current 
Protocols in Molecular Biology 1987-1988, Wiley Interscience. Unless otherwise 
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indicated, restriction enzymes, polymerases and other enzymes as well as DNA 
purification kits were used according to the manufacturers specifications. 

Example I: Identification of SDR sites in human trypsin 

Insertion sites for SDRs have been identified in the serine protease human 
trypsin I (structural class SI) by comparison with members of the same 
structural class having a higher sequence specificity. Trypsin represents a 
member with low substrate specificity, as it requires only an arginine or lysine 
residue at the Pi position. On the other hand, thrombin, tissue-type plasminogen 
activator or enterokinase all have a high specificity towards their substrate 
sequences, i.e. (L/I/V/F)XPR A NA, CPGR^WGG and DDDK^, respectively. The 
primary sequences and tertiary structures of these and further SI serine 
proteases have been aligned in order to determine regions of low and high 
sequence and structure homology and especially regions that correspond to 
insertions in the sequences of the more specific proteases (Figure 2). Several 
regions of insertions equal or longer than 3 amino acids representing potential 
SDR sites have been identified as indicated in Figure 1. These regions were 
chosen as target sites for the insertion of SDRs in the examples below, e.g. SDR1 
(region one in figure 2, after amino acid 42 according to SEQ ID NO:l) with a 
length of six and SDR2 (region three in figure 2, after amino acid 123 according 
to SEQ ID NO:l) with a length of five amino acids, respectively. 

Example II: Molecular cloning of the human trypsin I gene to be used as scaffold 
protein and expression of the mature protease in B. subtilis 

The gene encoding the unspecific protease human trypsinogen I was cloned into 
the vector pUC18. Cloning was done as follows: the coding sequence of the 
protein was amplified by PCR using primers that introduced a Kpnl site at the 5' 
end and a BamHI site at the 3' end. This PCR fragment was cloned into the 
appropriate sites of the vector pUC18. Identity was confirmed by sequencing. 
After sequencing the coding sequence of the mature protein was amplified by 
PCR using primers that introduced different Bgil sites at the 5' end and the 3' 
end. 

This PCR fragment was cloned into the appropriate sites of an E. coli - B. subtilis 
shuttle vector. The vector contains a pMBl origin for amplification in E. coli, a 
neomycin resistance marker for selection in E. coli, as well as a P43 promoter for 



WO 2004/H3521 



PCT/EP2004/051172 



66 

the constitutive expression in B. subtilis. A 87 bp fragment that contains the 
leader sequence encoding the signal peptide from the sacB gene of B. subtilis 
was introduced behind the P43 promoter. Different Bgll restriction sites serve as 
insertion sites for heterologous genes to be expressed. 

Expression of human trypsin I was confirmed by measurement of the proteolytic 
aciticity in supernatant of cells containing the vector with the gene in comparison 
to a negative control. A peptide including an arginine cleavage site was chosen 
as a substrate. The peptide was N-terminally biotinylated and labeled with a 
fluorophore at the C-terminus. After incubation of the peptide with culture 
supernatant streptavidin was added. Uncleaved peptide associate with 
streptavidin and lead to a high read out value while cleavage results in low read 
out values. Figure 11 shows the time course of a proteolytic digestion of B. 
subtilis cells containing the vector with the trypsin I gene in comparison to B. 
subtilis ceNs containing the vector without the trypsin I gene (negative control). 
As a further confirmation of expression of the protease, supernatants of cells 
containing the vector with the gene and control cells were analyzed by 
polyacrylamid gel electrophoreses and subsequent western blot using an 
antibody specific to the target protease. The procedure was performed according 
to standard methods (Sambrook, J.F; Fritsch, E.F.; Maniatis,T.; Cold Spring 
Harbor Laboratory Press, Second Edition, 1989, New York). Figure 8 confirms 
expression of the protein only in the cells harbouring the vector with the gene for 
trypsin. 

Example HI: Providing a scaffold protein 

In this example, human trypsin I was used as the scaffold protein. The gene was 
either used in its natural form, or, alternatively, was modified to result in a 
scaffold protein with increased catalytic activity or further improved 
characteristics. 

The modification was done by random modification of the gene, followed by 
expression of the enzyme and subsequent selection for increased activity. First, 
the gene was PCR amplified under error-prone conditions, essentially as 
described by Cadwell, R.C and Joyce, G.F. (PCR Methods Appl. 2 (1992) 28-33). 
Error-prone PCR was done using 30 pmol of each primer, 20 nmol dGTP and 
dATP, 100 nmol dCTP and dTTP, 20 fmol template, and 5 U Taq DNA polymerase 
in 10 mM Tris HCI pH 7.6, 50 mM KCI, 7 mM MgCI2, 0.5 mM MnCI2, 0.01 % 
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gelatin for 20 cycles of 1 min at 94 °C # 1 min at 65 °C and 1 min at 72 °C. The 
resulting DNA library was purified using the Qiaquick PCR Purification Kit 
following the suppliers 1 instructions. The PCR product was digested with the 
restriction enzyme Bgfl and purified. Afterwards, the PCR product was ligated 
into the E. coli - B. subtilis shuttle vector described above which was digested 
with Bgll and dephosphorylated. The ligation products were transformed into E. 
coli, amplified in LB, and the plasmids were purified using the Qiagen Plasmid 
Purification Kit following the suppliers* instructions. Resulting piasmids were 
transformed into B. subtilis cells. 

Alternatively, or in addition to random mutagenesis, variants of the gene were 
statistically recombined at homologous positions by use of the Recombination 
Chain Reaction, essentially as described in WO 0134835. PCR products of the 
genes encoding the protease variants were purified using the QIAquick PCR 
Purification Kit following the suppliers' instructions, checked for correct size by 
agarose gel electrophoresis and mixed together in equimolar amounts. 80 pg of 
this PCR mix in 150 mM TrisHCI pH 7.6, 6.6 mM MgCI 2 were heated for 5 min at 
94 °C and subsequently cooled down to 37 °C at 0.05 °C/s in order to re-anneal 
strands and thereby produce heteroduplices in a stochastic manner. Then, 2.5 U 
Exonuclease III per pg DNA were added and incubated for 20 f 40 or 60 min at 37 
°C in order to digest different lengths from both 3' ends of the heteroduplices. 
The partly digested PCR products were refilled with 0.6 U Pfu polymerase per pg 
DNA by incubating for 15 min at 72 °C in 0.17 mM dNTPs and Pfu polymerase 
buffer according to the suppliers* instructions. After performing a single PCR 
cycle, the resulting DNA was purified using the QIAquick PCR Purification Kit 
following the suppliers' instructions, digested with Bgll and ligated into the 
linearized vector. The ligation products were transformed into E. coli, amplified in 
LB containing ampicillin as marker, and the plasmids were purified using the 
Qiagen Plasmid Purification Kit following the suppliers' instructions. Resulting 
plasmids were transformed into B. subtilis cells. 

Example IV: Insertion of SDRs into the protein scaffold of human trypsin I and 
generation of an engineered proteolytic enzyme with specificity for a peptide 
substrate having the sequence KKWLGRVPGGPV. 

In order to create insertion sites for SDRs in human trypsin I, two pairs of 
different restriction sites were introduced into the gene at sites that were 
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identified as potential SDR sites (see Example I above) without changing the 
amino acid sequence. The insertion of the restriction sites was done by overlap 
extension PCR. Primers restrl and restr2 were used for the introduction of SacII 
and BamHI restriction sites, restr3 and restr4 were used for the introduction of 
Kpnl and Nhel restriction sites. The sequences of the primers were as follows: 

Binding site for restrl and restr2 and the corresponding amino acid sequence 
(SEQ ID NO:54): 

5 ' -GGTGGTAT CAGCAG GCCACTGCTACAAGTCCC GCATCC AGGT-3 1 
VVSAGHCYKSRIQ 

Forward primer restrl (SEQ ID NO: 56): 

5 » -GGTGGTATCCGCGGGCCACTGCTACAAGTCCCGGATCCAGGT-3 1 



Reverse primer restr2 (SEQ ID NO: 57): 

5 1 -ACCTGGATCCGGGACTTGTAGCAGTGGCCCGCGGATACCACC-3 » 



Binding site for restr3 and restr4 and the corresponding amino acid sequence 
(SEQ ID NO:58): 

5 ' -CCACT GGCACG AAGTGCCTCATCTCTGGCTGGGGCAACACT GCGAGC TCT-3 9 
TGTKCLI SGWGNTAS S 

Forward primer restr3 (SEQ ID NO:60): 

5 1 -CCACT GGCACGA AGTGCCTCATCTCTGGCTGGGGCAACACT GCGAGC TCT-3 1 

Reverse primer restr4 (SEQ ID NO: 61): 

5 ' -AGAGCTAGCAGTGTTGCCCCAGCCAGAGATGAGGCACTTGGTACCAGTGG-3 ' 



In a first overlap extension PCR, the SacII/BamHI sites were introduced, 
enabling to insert SDR1, and in a second overlap extension PCR the KpnI/Nhel 
sites, enabling the insertion of SDR2. The product of the overlap extension PCR 
was amplified using primers pUC-forward and pUC-reverse. The sequences of 
pUC-forward and pUC-reverse are as follows: 

pUC-forward (SEQ ID NO:62): 5 , -GGGGTACCCCACCACCATGAATCCACTCCT-3' 
pUC-reverse (SEQ ID NO: 63): 5 , >CG GGATCCG GTATAGAGACTGAAGAGATAC-3 i 

The restriction sites generated thereby were subsequently used to insert defined 
or random oligonucleotides into the SDR1 and SDR2 insertion sites by standard 
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restriction and ligation methods. Typically, two complementary synthetic 5'- 
phosphorylated oligonucleotides were annealed and ligated into a vector carrying 
the modified human trypsin I gene that was cleaved with the respective 
restriction enzymes. Oligonucleotides encoding SDR1 were inserted via the 
SacII/BamHI sites whereas oligonucleotides encoding SDR2 were inserted via the 
KpnI/Nhel sites. For each insertion an oligonucleotide pair according to the 
following general sequences was used ([P] indicating S'-phosphorylation, N and X 
indicating any nucleotide or amino acid residue, respectively): 
oligox-SDRlf (SEQ ID NO:64): 

5 ■ - [P] -GGGCCACTGCTAC NI^I^NNNNNNNNNNNNbJA AGTCCCG-3 1 

oligox-SDRlr (SEQ ID NO:66): 

3 1 -CGCCCGGTGACGATG NNNNNNNNNNNNNNNNNN TTCAGGGCCTAG- [P ] -5 1 
GHCYXXXXXXKS 



oligox-SDR2f (SEQ ID NO:67): 

5 ' - [P] -CAAGTGCCTCATCTCTGGCTGGGGCAAC NNNNNNNNNNNNtSTNNA CTG-3 1 

oligox-SDR2r (SEQ ID NO:69): 

3 1 -CATGGTTCACGGAGTAGAGACCGACCCCGTTG NNNNNNNNNNNNNNN TGACGATC- [P] -5 ■ 
KCLISGWGN X X X X X T 

As an alternative to the above method, a PCR based method was used for the 
integration of random-sequences into the SDR1 and SDR2 insertion sites in the 
modified human trypsin I. For each SDR, one primer was used where the SDR 
region is fully randomized. Sequences of the primers were as follows (N = 
A/C/G/T, B = C/G/T, V = A/C/G) : 
Primer SDRl-mutnnb-forward (SEQ ID NO:70): 

5 r -TGGTATCCGCGGGCCACTGCrACNNBNNBNNBNNBNNBNNBAAGTCCCGGATCCAGGTG-3 ■ 

Primer SDR2-mutnnb-reverse (SEQ ID NO:71): 

5 ' -GGCGCCAGAGCTAGCAGTVNNVNNVNNWNVNNGTTGCCCCAGCCAGAGATG-3 1 

The codon NNB, or VNN in the reverse strand, allows all 20 amino acids to made, 
but reduces the probability of encoding a stop codon from 0.047 to 0.021. 

As a further alternative, after identification of SDRs that lead to increased 
specificity, these SDRs were used as templates for further randomization. 
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Thereby, random peptide sequences were inserted that were partially 
randomized at each position and partially identical at each position to the original 
sequence. 

As an example, random peptide sequences that have in approximately 1 of 3 
cases the template amino acid residue and in approximately 2 of 3 cases any 
other amino acid residue at each position were inserted into the two SDR 
insertion sites of the modified human trypsin I. For this purpose, primers that 
contain at each nucleotide position of the SDR approximately 70% of the 
template bases and 30% of a mixture of the three other bases were used. 
With each primer pair a PCR was performed under standard conditions using the 
human trypsin I gene as template. The resulting DNA was purified using the 
QIAquick PCR Purification Kit following the suppliers 1 instructions and digested 
with SacII and Nhel. After digestion the DNA was purified and ligated into the 
SacII and Nhel digested and dephosphorylayted vector. The ligation products 
were transformed into E. coli, amplified in LB containing the respective marker, 
and the plasmids were purified using the Qiagen Plasmid Purification Kit following 
the suppliers' instructions. Resulting plasmids were transformed into B. subtilis 
cells. These cells were then separated to single cells, grown to clones, and after 
expression of the protease gene screened for proteolytic activity. 
The following substrates were employed for screening for proteolytic activity 
(SEQ ID NOs:76 and 77): 
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Protease variants were screened on substrate B at complexities of 10* variants 
by confocal fluorescence spectroscopy. The substrate was a peptide biotinylated 
at the N-terminus and fluorescently labeled at the C-terminus. After incubation of 
the peptide with supernatant of cells expressing different variants of the 
protease, streptavidin is added and the samples are analysed by confocal 
fluorimetry. The low concentration of the peptide (20nM) leads to a preferential 
cleavage by proteases with a high kcat/KM value, i.e. proteases with high 
specificity towards the target sequence. 
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Variants selected in the screening procedure were further evaluated for their 
specificity towards substrate B and closely related substrate A by measuring time 
courses of the proteolytic digestion and determining the rate constants which are 
proportional to the kcat/KM values. Clearly, compared to the human trypsin that 
was used as scaffold protein, the specific activity of variants 1 and 2 is shifted 
(SEQ ID NOs: 2 and 3, respectively) towards substrate B. Variant 3 (SEQ ID 
NO:4), on the other hand, serves as a negative control with similar activities as 
the human trypsin I. Sequencing of the genes of the three variants revealed the 
following amino acid sequences in the SDRs. 

Table 2 : Sequences of the two SDRs in three different variants selected for 
specific hydrolysis of substrate B (SEQ ID NOs:78-83). 
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In a further experiment a pool of variants containing different numbers of SDRs 
per gene were screened for increased specificity using a mixture of the defined 
substrate and pepton as a competing substrate. Variants containing one or two 
SDRs per gene have been analyzed further. As a measure for the specificity the 
activity in the peptide cleavage assay was compared with and without the 
presence of the competing substrate. The concentration of the competing 
substrate was lOmg/ml. Under these conditions, unspecific proteases show, 
compared to specific proteases, a stronger decrease in activity with increasing 
competitor concentrations (range between 0 and lOOmg/ml). The ratio of 
proteolytic activity with and without substrate is a quantitative measure for the 
specificity of the proteases. Figure 9 shows the relative activities with and 
without competing substrate. Human trypsin I that was used as the scaffold 
protein and two variants, one containing only SDR2, and one containing both 
SDRs, were compared. The specificity of the variant with both SDRs is by a factor 
of 2.5 higher than that of the variant with SDR2 only, confirming that there is a 
direct relation between the number of SDRs and the quantitative specificity of 
resulting engineered proteolytic enzymes. 
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Example V: Gener ation of an engineered proteolytic enzyme that specifically 
inactivates human TNF-aloha 

Human trypsin alpha I or a derivative comprising one or more of the following 
amino acid substitutions E56G; R78W; Y131F; A146T; C183R was used as 
protein scaffold for the generation of an engineered proteolytic enzyme with high 
specificity towards human TNF-alpha. The identification of SDR sites in human 
trypsin I or derivatives thereof was done as described above. Two insertion sites 
within the scaffold were choosen for SDRs. The protease variants containing two 
inserts with different sequences and also the human trypsin I itself with no 
inserts were expressed in a Bacillus subttlis cells. The variant protease cells were 
separated to single cell clones and the protease expressing variants were 
screened for proteolytic activity on peptides with the desired target sequence of 
TNF-alpha. The activity of the protease variants was determined as the cleavage 
rate of a peptide with the desired target sequence of TNF-alpha in the absence 
and presence of competitor substrate. The specificity is expressed as the ratio of 
cleavage rates in the presence and absence of competitor (Fig. 14). 

Table 3: Relative specificity of variants of engineered proteolytic enzymes with 
different SDR sequences in absence and presence of competitor substrate (SEQ 
ID NOs: 84-95). 
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is shown in Figure 15. By the use of the variants, the induction of apoptosis is 
almost completely eliminated indicating the anti-inflammatory efficacy of the 
inventive proteases to initiate TNF-alpha break down. TNF-alpha has been 
incubated with concentrated supernatant from cultures expressing the variants i 
to iii for 2 hours. The resulting TNF-alpha has been incubated with non-modified 
cells for 4 hours. The effect of the remaining TNF-alpha activity was determined 
as the extent of apoptosis induction by detection of activated caspase-3 as 
marker for apoptotic cells. For the controls either no protease was added with 
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the human TNF-alpha (dead cells) or buffer instead of human TNF-alpha (live 
cells) was used, respectively. An analogous experiment is shown in Figure 16 
using purified variant xiii. TNF-alpha was incubated with different concentrations 
of the purified inventive protease variant. 

To demonstrate the specificity of the inventive protease variants, proteins from 
human blood serum or purified human TNF-alpha have been incubated with 
human trypsin I or the inventive engineered proteolytic enzyme variants, 
respectively. Here, variant x corresponds to Seq ID No: 75 comprising the same 
SDRs as variant f, i.e. SDRs according to Seq ID No. 89 (SDR1) and 95 (SDR2). 
Variants xi and xii correspond to derivatives thereof comprising the same SDR 
sequences. Remaining intact protein was was determined as a function of time. 
While the variants as well as human trypsin I digest human TNF-alpha, only 
trypsin shows activity on serum protein (Figure 17 a and b). This demonstrates 
the high TNF-alpha specificity of the inventive proteolytic enzymes and indicates 
their safety and accordingly their low side effects for therapeutic use. 

Example VI: Gene ration of an engineered proteolytic enzyme that specifically 
hydrolysis human VEGF. 

Human trypsin I was used as protein scaffold for the generation of an engineered 
proteolytic enzyme with high specificity towards human VEGF. The identification 
of SDR sites in human trypsin I was done as described above. Two insertion sites 
within the scaffold were choosen for SDRs. The protease variants containing two 
inserts with different sequences were expressed in Bacillus subtilis cells. The 
variant protease cells were separated to single cell clones and the protease 
expressing variants were screened as described above. The activity of the 
protease variants was determined as the rate of VEGF cleavage. 4pg of 
recombinant human VEGF165 was incubated with 0.18 pg of purified protease in 
PBS / pH 7.4 at room temperature. Aliquots were taken at the indicated time 
points and analysed on a polyacrylamide gel. The extend of cleavage was 
quantified by densitometry analysis of the bands. The activity is plotted over 
incubation time in Figure 18. Specific cleavage was controlled by further SDS 
polyacrylamide gel analyses. 
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Claims 

1. An engineered enzyme with catalytic activity of defined specificity, 
characterized by a combination of the following components: 

(a) a protein scaffold capable to catalyze at least one chemical reaction on at 
least one target substrate, and 

(b) one or more specificity determining regions (SDRs) located at sites in the 
protein scaffold that enable the resulting engineered protein to discriminate 
between at least one target substrate and one or more different substrates and 
wherein the SDRs are essentially synthetic peptide sequences. 

2. The engineered enzyme according to claim 1, wherein 

(I) the SDRs (b) have a length of less than 50 amino acid residues, preferably 
have a length between two and 20 amino acid residues, more preferably a length 
between two and ten amino acid residues, even more preferably a length 
between three and eight amino acid residues, and wherein the number of SDRs 
is at least one, preferably more than one, more preferably between two and 
eleven, most preferably between two and six; and/or 

(II) the protein scaffold (a) is comprised of one or more polypeptides being 
derived from same or different 

(i) proteins encoded by a gene of viral, prokaryotic or eukaryotic origin, and/or 

(ii) native enzymes, mutated variants or truncated derivates thereof, and/or 

(iii) mammalian enzymes, preferably human enzymes. 

3. The engineered enzyme according to claim 1 or 2, wherein the protein scaffold 
(a) is derived from an enzyme selected from the group consisting of hydrolases, 
preferably proteases; lipases; glycosylases; transferases, preferably 
gly cosy (transferases; oxidoreductase, preferably monooxygenases and 
dioxygenases; lyases; isomerases and ligases, 

more preferably the protein scaffold (a) is derived from a protease selected from 
the group consisting of aspartic, cysteine, serine, metallo and threonine 
proteases, 

even more preferably the protein scaffold (a) is derived from a serine protease of 
the structural class SI, S8, Sll, S21, S26, S33 or S51, most preferably from 
class SI or S8, a cysteine protease of the structure class CI, C2, C4, CIO, C14, 
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C19, C47, C48 or C56, most preferably from class C14, or an aspartic protease of 
the structural class Al, A2 or A26, most preferably from class Al, or a 
metalloprotease of the strucutral class M4 or M10. 

4. The engineered enzyme according to claim 3, wherein 

(i) the protein scaffold (a) is derived from a serine protease of the structural 
class SI; and/or 

(ii) the SDRs are located at one or more positions from the group of positions 
that correspond structurally or by amino acid sequence homology to the regions 
18-25, 38-48, 54-63, 73-86, 122-130, 148-156, 165-171 and 194-204 in human 
trypsin I having the amino acid sequence shown in SEQ ID NO:l, and preferably 
at one or more positions from the group of positions that correspond structurally 
or by amino acid sequence homology to the regions 20-23, 41-45, 57-60, 76-83, 
125-128, 150-153, 167-169 and 197-201 in human trypsin L 

5. The engineered enzyme according to claim 4, wherein 

(i) the protein scaffold (a) is derived from the serine protease trypsin, preferably 
human trypsin I having the amino acid sequence shown in SEQ ID NO:l, or a 
derivative thereof, or the amino acid sequence SEQ ID NO: 1 comprising one or 
more of the following amino acid substitutions E56G, R78W, Y131F, A146T and 
C183R; and 

(ii) at least one of two SDRs are located in the scaffold, a first SDR having a 
length of up to 6 amino acids and being inserted between residues 42 and 43, 
and a second SDR having a length of up to 5 amino acids and being inserted 
between residues 123 and 124 (numbering relative to human trypsin I having 
the amino acid sequence shown in SEQ ID NO:l). 

6. The engineered enzyme according to claim 5, which comprises one of the 
peptide sequences of the following group: SEQ ID NO: 72, 78, 79, 80, 84, 85, 
86, 87, 88, and 89 inserted as the first SDR between residues 42 and 43 and/or 
one of the peptide sequences of the following group: SEQ ID NO: 73, 81, 82, 83, 
90, 91, 92, 93, 94, and 95 inserted as the second SDR between residues 123 
and 124 ; or wherein the engineered enzyme comprises an amino acid sequence 
as shown in SEQ ID NO: 74, or SEQ ID NO: 75. 
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7. The engineered enzyme according to claim 3, wherein 

(i) the protein scaffold (a) is derived from a serine protease of the structural 
class S8, and/or 

(ii) the SDRs are located at one or more positions from the group of positions 
that correspond structurally or by amino acid sequence homology to the regions 
6-17, 25-29, 47-55, 59-69, 101-111, 117-125, 129-137, 139-154, 158-169, 
185-195 and 204-225 in subtilisin E from Bacillus subtilis having the amino acid 
shown in SEQ ID NO:7, and preferably at one or more positions from the group 
of positions that correspond structurally or by amino acid sequence homology to 
the regions 59-69, 101-111, 129-137, 158-169 and 204-225 in subtilisin E from 
Bacillus subtilis. 

8. The engineered enzyme according to claim 3, wherein 

(i) the protein scaffold (a) is derived from an aspartic protease of the structural 
class Al; and/or 

(ii) the SDRs are located at one or more positions from the group of positions 
that correspond structurally or by amino acid sequence homology to the regions 
6-18, 49-55, 74-83, 91-97, 112-120, 126-137, 159-164, 184-194, 242-247, 
262-267 and 277-300 in human pepsin having the amino acid sequence shown in 
SEQ ID NO: 11, and more preferably at one or more positions from the group of 
positions that correspond structurally or by amino acid sequence homology to 
the regions 10-15, 75-80, 114-118, 130-134, 186-1 91 and 280-296 in human 
pepsin. 

9. The engineered enzyme according to claim 3, wherein 

(i) the protein scaffold (a) is derived from a cysteine protease of the structural 
class C14; and/or 

(ii) the SDRs are located at one' or more positions from the group of positions 
that correspond structurally or by amino acid sequence homology to the regions 
78-91, 144-160, 186-198, 226-243 and 271-291 in human caspase 7 having the 
amino acid sequence of SEQ ID NO: 14, and preferably at one or more positions 
from the group of positions that correspond structurally or by amino acid 
sequence homology to the regions 80-86, 149-157, 190-194 and 233-238 of 
human caspase 7. 
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10. A fusion protein which is comprised of at least one engineered enzyme 
according to any of claims 1 to 7 and 

(i) at least one further proteinacious component, preferably being selected from 
the group consisting of binding domains, receptors, antibodies, regulation 
domains, pro-sequences, and fragments thereof, and/or 

(ii) at least one further functional component, preferably being selected from the 
group consisting of polyethylenglycols, carbohydrates, lipids, fatty acids, nucleic 
acids, metals, metal chelates, and fragments or derivatives thereof. 

11. A nucleic acid molecule that comprises a nucleic acid sequence that encodes 
an enzyme according to any one of claims 1 to 9 or a fusion protein according to 
claim 10. 

12. A vector comprising the nucleic acid of claim 11. 

13. A host cell comprising the vector of claim 12 or comprising the nucleic acid 
molecule of claim 11. 

14. The host cell according to claim 13, which is selected from the group 
consisting of Escherichia coli, Bacillus subtilis, Saccharomyces cerev/siae, Pichia 
pastonis, CHO and BHK. 

15. A method for producing the enzyme of claim 1 or the fusion protein of claim 
10, which comprises cultivating a host cell according to claim 13 or 14. 

16. A method for generating an engineered enzyme according to any one of 
claims 1 to 9 having defined specificity towards at least one target substrate 
comprising at least the following steps: 

(a) providing a protein scaffold which catalyzes at least one chemical reaction on 
at least one target substrate, 

(b) generating a library of engineered enzymes or isolated engineered enzymes 
by combining a polynucleotide encoding the protein scaffold from step (a) with 
one or more fully or partially random synthetic oligonucleotide sequences 
encoding synthetic peptide sequences, at sites in the polynucleotide that enable 
the resulting encoded engineered enzyme to discriminate between at least one 
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target substrate and one or more different substrates, expressing said enzymes, 
and 

(c) selecting out of the library of engineered enzymes generated in step (b) one 
or more enzymes that have defined specificities towards at least one target 
substrate. 

17. The method according to claim 16, wherein 

(I) the sites at which the combinations of step (b) are performed are specific 
sites within the protein scaffold, and wherein sites that are suitable as combining 
sites are identified by 

identification of regions close to the active site, preferably by structural analysis 
of complexes of the protein scaffold with competitive inhibitors or substrate 
analogs, and/or 

structural alignment of different enzymes of the same structural class having 
different qualitative or quantitative specificities, and identification of 
heterologous regions, and/or 

comparative analysis of amino acid sequences from enzymes of the same 
structural class having different qualitative or quantitative specificities, and 
identification of heterologous regions, and/or 

experimental analysis comprising mutagenesis techniques such as alanine 
scanning, random mutagenesis, random insertion or random deletion, and 
subsequent identification of regions in the protein scaffold that are essential or 
sensitive for specificity; and/or 

(II) the combining sites of step (b) are randomly distributed over the protein 
scaffold. 

18. The method according to claim 16 or 17, wherein the peptide sequences 
combined in step (b) are fully or partially random and/or have a length variation; 
and/or wherein the selection in step (c) is achieved by screening for enzyme 
activity and/or enzyme affinity 

(i) under low target substrate concentrations, or 

(ii) by using the target substrate and at least one more substrate in comparison, 
or 

(iii) by adding in excess other substrates than the target substrate, thereby using 
the added substrates as competitors, or 



WO 2004/113521 



PCT/EP2004/051172 



6 

(iv) by adding enzyme inhibitors, or 

(v) by selecting enzymes that preferentially bind to the target substrate and 
selecting out of this subgroup those enzymes that convert the substrate, or 

(vi) any combination thereof. 

19. The method according to any of claims 16 to 18, wherein 

(i) the steps (a) to (c) are repeated at least for one further cycle, and with the 
SDRs selected in step (c) of one cycle serving as templates for the randomization 
of protein sequences inserted in step (b) of the further cycle; and/or 

(ii) during or after one or more rounds of steps (a) to (c), the scaffold is mutated 
at one or more positions in order to make the scaffold more acceptable for the 
combination with SDR sequences, and/or to increase catalytic activity at a 
specific pH and temperature, and/or to change the glycosylation pattern, and/or 
to decrease sensitivity towards enzyme inhibitors, and/or to change enzyme 
stability. 

20. The method according to claim 16, which 

(I) comprises at least the following steps: 

(a) providing a first protein scaffold fragment, 

(b) connecting said protein scaffold fragment via a peptide linkage with a first 
SDR , and optionally 

(c) connecting the product of step (b) via a peptide linkage with a further SDR 
peptide or with a further protein scaffold fragment, and optionally 

(d) repeating step (c) for as many cycles as necessary in order to generate a 
sufficiently specific enzyme, and 

(e) selecting out of the population generated in steps (a) - (d) one or more 
enzymes that have the desired specificities toward the one or more target 
substrates; or 

(II) comprises at least the following steps: 

(a) providing a polynucleotide encoding a protein scaffold capable of catalyzing 
one or more chemical reactions on one or more target substrates; 

(b) combining one or more fully or partially random synthetic oligonucleotide 
sequences with the polynucleotide encoding the protein scaffold, the fully or 
partially random synthetic oligonucleotide sequences being located at sites in the 
polynucleotide that enable the encoded engineered enzyme to discriminate 



WO 2004/113521 



PCT/EP2004/051172 



7 

between the one or more target substrates and one or more other substrates; 
and 

(c) selecting out of the population generated in step (b) one or more 
polynucleotides that encode enzymes that have the desired specificities toward 
the one or more target substrates. 

21. A composition comprising one or more engineered enzymes according to any 
of claims 1 to 9 or a fusion protein according to claim 10, wherein said 
composition is preferably a research composition, nutritional composition, food 
additive composition, cleaning composition, desinfection composition, cosmetic 
composition or composition for personal care, and/or wherein said composition 
optionally comprises acceptable carrier(s) and/or auxiliary agent(s). 

22. Use of an engineered enzyme according to any of claims 1 to 9 or a fusion 
protein according to claim 10 for research, nutritional, personal care or industrial 
purposes. 
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Trypsin 

a -Thrombin 

Enteropept idase 



IVGGYNCEENSVPYQVSL NSGYHF-CGGSLINEQWWSAGHCY 

I VEGSDAE IGMSP WQVMLFRKS PQE LL- CG ASLI SDRWVLTAAHCLLYPP 
IVGGSNAKEGAWPWWGL YYGGRLLCGASLVSSDWLVSAAHCVYGRN 
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Trypsin 
a-Thrombin 
Enteropept idase 



KSRIQVR1.GF.H N IEVLEGN -EQFINAAKI I RHPQYD- RKTL 

WDKNFTENDLLVRIGKH SRTRYERNIEKISMLEKIYIHPRYNWRENL 

LE PSKWTAILGLHMKSNLTSPQTV-PRLID — EI VIMPH YM-RRRK 

-\ * * * * + * 



Trypsin 
a-Thrombin 
Enteropept idase 



NNDIMLIKLSSRAVINARVSTISLPTA PPAT GTKCLISGWG 

DRDIALMKLKKPVAFSDYIHPVCLPDR ET AAS LLQAGY KGRVTGWG 

DNDI AMMH LEFKVNYTD Y IQP I CLP EENQVFPP — GRNCSIAGWG 



* * 



* * * 



Trypsin 
a-Thrombin 
Enteropept idase 



M TASSGADYPDELQCLDAPVLSQAKCEASYPG-KITSHMFCVGFL 

M LK ET WT ANVGKGQ P S VLQ WN LP I VERP VCKD S TR I - R I TD NMFC AG YK 
T WYQGTT-ANILQEADVPLLSNERCQQQMPEYNITENMICAGYE 



— 3 — 



* * 



Trypsin 

a-Thrombin 

En t eropept idase 



— EGGK — DSCQGDSGGPWCNGQ LQ GWSWGDGCAQKWKP 

PDEGKRGDACEGDSGGPFVMKSP FNNRWYQMGIVSWGEGCDRDGKY 



-EGG I — DSCQGDSGGPLMCQENNRWFLA- 



-GVTSFGYKCALPNRP 



Trypsin 

a-Thrombin 
Enteropept idase 



GV YTKVYN YVKWI KNT I AANS - 

GFYTHVFRLKKWIQKVIDQFGS 
GVYARVSRFTEWIQSFLH 



Fig. 2 




Fig. 3 
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sub 
f urin 
PC_SK1 
PC SK5 



IAHEYAQSV — PY GISQ— IKAPALHSQGY 

VAKRRAKRD — VYQEPTDPKFPQQWYLSGVTQRDLNVKEAWAQGF 

ALNL — FNDPMWNQQWYLQDTRMTAALPKLDL 



EKERSKRSALRDS- 



NTHPCQ — 



•SD — MNIEGAWKRGY 



sub 
furin 
PC_SK1 
PC SK5 



TGSNVKVAVIDSGIDSSHPDL — NVRGGAS — FVPSETN P 

TGHGIWS I LDDGIEKNHPDLAGNYDPGAS — FDVNDQD PDPQ 

HVIPVWQKGI TGKGWITVLDDGLEWNHTDI YANYDPEASYDFNDNDHD P 

TGKNIWTI LDDGIERTHPDL MQNYDA- -LASCDVNGNDLDPMP 

* * * *• 3 



sub YQ DGSS HGTHVAGTIA — AL— NNS I GVLGVSP SASLYAVKVLDS 

furin PRYTQM MDHR HGTRCAGEVA — AVANNGVCGVGVAYWARIGGVRMLD 

PC_SK1 FPRYDPTNENK HGTRCAGEIAMQAN-NHKCGV-GVAYHSKVGGIRMLDG 

PC_SK5 RY DASNENKH GTRC AGEVA — AAANNS HCTVG I AFNAKIGGVRMLDGDVTD 

4 * * * * * * * •*■ _______ 



sub 
furin 
PQ_SKl 
PCSK5 



-TGSGQYSWI INGI E-WAI SHNMDVIHMSLG GPT — GST A LKT- - 

GEVTDAVE ARS -LGLNP NH IHIYSASW GPEDDGKT VDGPARLAEE- - 

-IVTDAIEASSIGFN PGHVDI YSASWGPNDDGKTVEGP GRLA QKAFE 

MVEAKSVSFNPQHVHIYSASWGPDDDGKTVD GPA — PLT RQ-- 

_5 6 8 _ 



sub 
furin 
PC_SK1 
PC SK5 



— WDKAVSSG IWAAAAGNEGSS GSTSTVGYPAKYPST I AVGAV 

— AFFRGVSQGRGGLGSIFVWASGNGGREHDSCHCDGYTMSI-YTLSISSATQFGNV 

YGVKQGRQGKG SI FVWASGNGGRQ GDHCDCD GYTDS I YTI SI 

— AFENGVRMGRRGLGSVFVWASGNGGRSKDHCSCDGYTNSI-YTI SISSTAESGKKPWY 
g * g 



sub 
furin 
PC_SK1 
PC SK5 



— N SSNQR ASFS5AG-SELDVMAPGVS IQSTLPGGTYGAY 

— PWYSEACSSTLA TTYSSGNQNEKQIVTTDLRQKCT ESH 

— S S AS QQ G LS P W Y AE KCS S TLATS YSS G - D YT DQRI TS AD LHNDCT ETH 

LEE CSSTL ATT YSSG-ESYDKKI ITTDLRQRCTDNH 

10 * 11 



Sub HGTSMATPHVAGAAALI L — SKHP — TWTNAQVRDRLESTATY — LG -MSF Y YGKGLINV 

furin TGTSASAPLAAGIIALTLEANKNL — TWRDMQHLWQTS KP AH — LN — ADDWATNGVGRK 

PC_SK1 TGTSASAPLAAGIFALAL — EAKP — HLTWRDMQHLV\'WTSEYDPLA-NHPGWKK1»1GAG_. 

PC_SK5 TGTSAS APMAAG 1 1 A LAI/- -EANPFLTWRDVQHVI VRTS RAGH --LNANDHKTNAAGPK V 



* 



* * 



Fig. 4 
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Fig.5 
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TLVDEQP LENYLDMEYFGTIGIGTPAQDFTWFDTGSSNLWVPSVYCSSL — ACTN 

EMVDN LRGKSGQGYYVEMTVGSPPQTLNILVDTGSSNFAVGAAPHPFL 

P AVTEGP I PEVLKNYMDAQY YGE I G I GTPPQCFTWFDTGS SNLWVP S I HCKLLD I ACW I 
* I * * * * ****** * 2 



HNRFNPEDSSTYQSTSETVSITYGTGSMTGILGYDTVQV G GISDTN 

HRYYQRQLSSTYRDLRKGVYVPYTQGKWEGELGTDLVSI PHGPNVTVRA 

HHKYNSDKSSTYVKNGTSFDIHYGSGSLSGYLSQDTVSVPCQSASSASALG GVKVER 

_ «*** 3 * + * 4 



QIFGLSETEPGSFLYYAPFDGILGLAYPSIS — SSGATPVFDNI WNQGLVSQDLFSVYLS 
NIAAITESDK-FFINGSNWEGILGLAYAEIARPDDSLEPFFDSLVKQTHVP-NLFSLQLC 

QVFGEATKQPGITFIAAKFDGILGMAYPRIS — VNNVLPVFDNLMQQKLVDQNIFSFYLS 
5 **** g ** * * +* * 



ADD KS — GS W IFGG I D S S Y Y TGS LMWVP VT VE G Y WQ I T VD S I TMNGE T I 

GAGFPLNQSEVLASV — GGSMIIGGIDHSLYTGSLWYTPIRREWYYEVI IVRVEINGQDL 

RDP DAQPGGELM1X5GTDSKYYKGSLSYLNVTRKAYWQVHLDQVEVASGLT 

7 ** * #• *** g 



A — CAEGC — QAIVDTGTSLLTGPTSPI ANIQSDIGASENSD GDMWSCSAI 

KMDCKEYNYDKSIVDSGTTNLRLPKKVFEAAVKSIKAASSTEKFPDGFWLGEQLV-CWQA 

L — CKEGC — EA I VDTGTS LM VGP VDEVRE LQKAI GA VP L I Q GEYMIPCEKV 

* * * •* * + * * * * g * 



SSLPDIVFTI HGVQYP VPPS A Y I LQSEGS CISGFQGMNVP-TESG 

GTTPWMIFPVTSLYLMGEVTNQSFRITILPQQYLRPVEDV ATSQDDCYKFAISQSS 

STLPAI TLKL GGKGYKLSPED YTLKVSQAGKTLCLSGFMGMDI P -PP SG 



ELWILGDVFI RQYFTVFDRAHNQVGLAPVA 
TGTVMGAVI MEGFYWFDRARKRIGF AVSA 

PLW ILGDVF IGRY YTVFDRDHMRVGF AEAA 
* * **** * * * 



Fig. 6 
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Fig. 7 
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01 MLEADDQGCI EEQGVEDSAN EDSVDAKPDR SSFVPSLF5K KKKNVTMRSI KTTRDRVPTY 



61 QYNMNFEKIjG KCIIINNKNF DKVTGMGVRN GTDKDAEALF KCFRSLGFDV IVYNDCSCAK 
121 MQDLLKKASE EDHTNAACFA CILLSHGEEN VIYGKDGVTP IKDLTAHFRG DRSKTLLEKP 
181 KLFFIQACRG TELDDGIQAD SGPINDTDAN PRYKIPVEAD FLFAYSTVPG YYSWRSPGRG 

2 41 SWFVQALCSI LEEHGKDLEI MQILTRVNDR VARH FESQSD DPHFHEKKQI PCVVSMLTKE 

5 

301 LYFSQ 



Fig. 8 
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Protein scaffold 
Protease A 
Protease B 
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substrate A substrate B 



Fig. 12 
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Fig. 16 
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proteolytic digestion of TNF 
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SEQUENCE LISTING 



<110> DIREVO Biotech AG 



<120> NEW BIOLOGICAL ENTITIES AND USE THEREOF 



<130> O41480wo JH/cw 



<160> 96 



<170> Patentln version 3.1 



<210> 1 

<211> 224 

<212> PRT 

<213> Homo sapiens 

<400> 1 

lie Val Gly Gly Tyr Asn Cys Glu Glu Asn Ser Val Pro Tyr Gin Val 

15 10 15 

Ser Leu Asn Ser Gly Tyr His Phe Cys Gly Gly Ser Leu lie Asn Glu 

20 25 30 

Gin Trp Val Val Ser Ala Gly His Cys Tyr Lys Ser Arg lie Gin Val 

35 40 45 

Arg Leu Gly Glu His Asn lie Glu Val Leu Glu Gly Asn Glu Gin Phe 

50 55 60 

lie Asn Ala Ala Lys lie lie Arg His Pro Gin Tyr Asp Arg Lys Thr 
65 70 75 80 

Leu Asn Asn Asp lie Met Leu lie Lys Leu Ser Ser. Arg Ala Val lie 

85 90 95 

Asn Ala Arg Val Ser Thr lie Ser Leu Pro Thr Ala Pro Pro Ala Thr 

100 105 110 

Gly Thr Lys Cys Leu lie Ser Gly Trp Gly Asn Thr Ala Ser Ser Gly 

115 120 125 

Ala Asp Tyr Pro Asp Glu Leu Gin Cys Leu Asp Ala Pro Val Leu Ser 

130 135 140 

Gin Ala Lys Cys Glu Ala Ser Tyr Pro Gly Lys lie Thr Ser Asn Met 
145 150 155 160 

Phe Cys Val Gly Phe Leu Glu Gly Gly Lys Asp Ser Cys Gin Gly Asp 

165 170 175 

Ser Gly Gly Pro Val Val Cys Asn Gly Gin Leu Gin Gly Val Val Ser 



WO 2004/113521 



2 



PCT/EP2004/05 1172 



180 

Trp Gly Asp Gly 
195 

Val Tyr Asn Tyr 
210 



Cys Ala Gin Lys 

200 

Val Lys Trp lie 
215 



185 

Asn Lys Pro Gly 

Lys Asn Thr lie 

220 



190 

Val Tyr Thr Lys 
205 

Ala Ala Asn Ser 



<210> 2 

<211> 235 

<212> PRT 

<213> artificial sequence 
<220> 

<223> trypsin variant 1 



<400> 2 

He Val Gly Gly 

1 

Ser Leu Asn Ser 

20 

Gin Trp Val Val 
35 

Lys Ser Arg He 
50 

Glu Gly Asn Glu 
65 

Gin Tyr Asp Arg 

Ser Ser Arg Ala 

100 

Thr Ala Pro Pro 
115 

Asn Thr He Thr 
130 

Glu Leu Gin Cys 
145 

Ala Ser Tyr Pro 

Leu Glu Gly Gly 

180 

Val Cys Asn Gly 
195 

Ala Gin Lys Asn 



Tyr Asn Cys Glu 
5 

Gly Tyr His Phe 

Ser Ala Gly His 

40 

Gin Val Arg Leu 
55 

Gin Phe He Asn 
70 

Lys Thr Leu Asn 
85 

Val He Asn Ala 

Ala Thr Gly Thr 

120 

Asn Ser Thr Ala 
135 

Leu Asp Ala Pro 
150 

Gly Lys He Thr 
165 

Lys Asp Ser Cys 

Gin Leu Gin Gly 

200 

Lys Pro Gly Val 



Glu Asn Ser Val 
10 

Cys Gly Gly Ser 
25 

Cys Tyr Asp Ala 

Gly Glu His Asn 

60 

Ala Ala Lys He 
75 

Asn Asp lie Met 
90 

Arg Val Ser Thr 
105 

Lys Cys Leu He 

Ser Ser Gly Ala 

140 

Val Leu Ser Gin 
155 

Ser Asn Met Phe 
170 

Gin Gly Asp Ser 
185 

Val Val Ser Trp 
Tyr Thr Lys Val 



Pro Tyr Gin Val 
15 

Leu lie Asn Glu 
30 

Val Gly Arg Asp 
45 

lie Glu Val Leu 

He Arg His Pro 

80 

Leu He Lys Leu 
95 

He Ser Leu Pro 
110 

Ser Gly Trp Gly 
125 

Asp Tyr Pro Asp 

Ala Lys Cys Glu 

160 

Cys Val Gly Phe 
175 

Gly Gly Pro Val 
190 

Gly Asp Gly Cys 
205 

Tyr Asn Tyr Val 
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3 



210 



215 



220 



Lys Trp lie Lys Asn Thr lie Ala Ala Asn Ser 



225 



230 



235 



<210> 3 
<211> 235 
<212> PRT 

<213> artificial sequence 
<220> 

<223> trypsin variant 2 
<400> 3 

lie Val Gly Gly Tyr Asn Cys Glu Glu Asn Ser Val Pro Tyr Gin Val 

15 10 15 

Ser Leu Asn Ser Gly Tyr His Phe Cys Gly Gly Ser Leu lie Asn Glu 

20 25 30 

Gin Trp Val Val Ser Ala Gly His Cys Tyr Asn Gly Arg Asp Leu Glu 

35 40 45 

Lys Ser Arg lie Gin Val Arg Leu Gly Glu His Asn lie Glu Val Leu 

50 55 60 

Glu Gly Asn Glu Gin Phe lie Asn Ala Ala Lys lie lie Arg His Pro 
65 70 75 80 

Gin Tyr Asp Arg Lys Thr Leu Asn Asn Asp lie Met Leu lie Lys Leu 

85 90 95 

Ser Ser Arg Ala Val lie Asn Ala Arg Val Ser Thr He Ser Leu Pro 

100 105 110 

Thr Ala Pro Pro Ala Thr Gly Thr Lys Cys Leu He Ser Gly Trp Gly 

115 120 125 

Asn Val Arg Gly Thr Trp Thr Ala Ser Ser Gly Ala Asp Tyr Pro Asp 

130 135 140 

Glu Leu Gin Cys Leu Asp Ala Pro Val Leu Ser Gin Ala Lys Cys Glu 
145 150 155 160 

Ala Ser Tyr Pro Gly Lys He Thr Ser Asn Met Phe Cys Val Gly Phe 

165 170 175 

Leu Glu Gly Gly Lys Asp Ser Cys Gin Gly Asp Ser Gly Gly Pro Val 

180 185 190 

Val Cys Asn Gly Gin Leu Gin Gly Val Val Ser Trp Gly Asp Gly Cys 

195 200 205 

Ala Gin Lys Asn Lys Pro Gly Val Tyr Thr Lys Val Tyr Asn Tyr Val 

210 215 220 

Lys Trp He Lys Asn Thr He Ala Ala Asn Ser 
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225 



230 



235 



<210> 4 
<211> 235 
<212> PRT 

<213> artificial sequence 
<220> 

<223> trypsin variant 3 
<400> 4 

lie Val Gly Gly Tyr Asn Cys Glu Glu Asn Ser Val Pro Tyr Gin Val 

15 10 15 

Ser Leu Asn Ser Gly Tyr His Phe Cys Gly Gly Ser Leu lie Asn Glu 

20 25 30 

Gin Trp Val Val Ser Ala Gly His Cys Tyr Ala Ala Thr Asn Gly Asp 

35 40 45 

Lys Ser Arg lie Gin Val Arg Leu Gly Glu His Asn lie Glu Val Leu 

50 55 60 

Glu Gly Asn Glu Gin Phe lie Asn Ala Ala Lys lie lie Arg His Pro 
65 70 75 80 

Gin Tyr Asp Arg Lys Thr Leu Asn Asn Asp lie Met Leu lie Lys Leu 

85 90 95 

Ser Ser Arg Ala Val lie Asn Ala Arg Val Ser Thr lie Ser Leu Pro 

100 105 110 

Thr Ala Pro Pro Ala Thr Gly Thr Lys Cys Leu lie Ser Gly Trp Gly 

115 120 125 

Asn Arg Lys Asp Phe Trp Thr Ala Ser Ser Gly Ala Asp Tyr Pro Asp 

130 135 140 

Glu Leu Gin Cys Leu Asp Ala Pro Val Leu Ser Gin Ala Lys Cys Glu 
145 150 155 160 

Ala Ser Tyr Pro Gly Lys He Thr Ser Asn Met Phe Cys Val Gly Phe 

165 170 175 

Leu Glu Gly Gly Lys Asp Ser Cys Gin Gly Asp Ser Gly Gly Pro Val 

180 185 190 

Val Cys Asn Gly Gin Leu Gin Gly Val Val Ser Trp Gly Asp Gly Cys 

195 200 205 

Ala Gin Lys Asn Lys Pro Gly Val Tyr Thr Lys Val Tyr Asn Tyr Val 

210 215 220 

Lys Trp He Lys Asn Thr He Ala Ala Asn Ser 
225 230 235 
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<210> 5 

<211> 259 

<212> PRT 

<213> Homo sapiens 



<400> 5 

He Val Glu Gly 

1 

Met Leu Phe Arg 

20 

He Ser Asp Arg 
35 

Pro Trp Asp Lys 
50 

Lys His Ser Arg 
65 

Leu Glu Lys He 

Asp Arg Asp He 

100 

Asp Tyr He His 
115 

Leu Leu Gin Ala 
130 

Lys Glu Thr Trp 
145 

Gin Val Val Asn 

Thr Arg He Arg 

180 

Asp Glu Gly Lys 
195 

Phe Val Met Lys 
210 

Val Ser Trp Gly 
225 

Thr His Val Phe 
Phe Gly Glu 



Ser Asp Ala Glu 
5 

Lys Ser Pro Gin 

Trp Val Leu Thr 

40 

Asn Phe Thr Glu 

55 

Thr Arg Tyr Glu 
70 

Tyr He His Pro 
85 

Ala Leu Met Lys 

Pro Val Cys Leu 

120 

Gly Tyr Lys Gly 
135 

Thr Ala Asn Val 
150 

Leu Pro He Val 
165 

He Thr Asp Asn 

Arg Gly Asp Ala 

200 

Ser Pro Phe Asn 
215 

Glu Gly Cys Asp 
230 

Arg Leu Lys Lys 
245 



He Gly Met Ser 
10 

Glu Leu Leu Cys 
25 

Ala Ala His Cys 

Asn Asp Leu Leu 

60 

Arg Asn He Glu 
75 

Arg Tyr Asn Trp 
90 

Leu Lys Lys Pro 
105 

Pro Asp Arg Glu 

Arg Val Thr Gly 

140 

Gly Lys Gly Gin 
155 

Glu Arg Pro Val 
170 

Met Phe Cys Ala 
185 

Cys Glu Gly Asp 

Asn Arg Trp Tyr 

220 

Arg Asp Gly Lys 
235 

Trp He Gin Lys 
250 



Pro Trp Gin Val 
15 

Gly Ala Ser Leu 
30 

Leu Leu Tyr Pro 
45 

Val Arg He Gly 

Lys He Ser Met 

80 

Arg Glu Asn Leu 
95 

Val Ala Phe Ser 
110 

Thr Ala Ala Ser 

125 

Trp Gly Asn Leu 

Pro Ser Val Leu 

160 

Cys Lys Asp Ser 
175 

Gly Tyr Lys Pro 
190 

Ser Gly Gly Pro 
205 

Gin Met Gly He 

Tyr Gly Phe Tyr 

240 

Val He Asp Gin 
255 
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<210> 6 

<211> 235 

<212> PRT 

<213> Homo sapiens 



<400> 6 

He Val Gly Gly Ser Asn Ala Lys Glu Gly Ala Trp Pro Trp Val Val 

15 10 15 

Gly Leu Tyr Tyr Gly Gly Arg Leu Leu Cys Gly Ala Ser Leu Val Ser 

20 25 30 

Ser Asp Trp Leu Val Ser Ala Ala His Cys Val Tyr Gly Arg Asn Leu 

35 40 45 

Glu Pro Ser Lys Trp Thr Ala lie Leu Gly Leu His Met Lys Ser Asn 

50 55 60 

Leu Thr Ser Pro Gin Thr Val Pro Arg Leu He Asp Glu He Val He 
65 70 75 80 

Asn Pro His Tyr Asn Arg Arg Arg Lys Asp Asn Asp He Ala Met Met 

85 90 95 

His Leu Glu Phe Lys Val Asn Tyr Thr Asp Tyr He Gin Pro He Cys 

100 105 110 

Leu Pro Glu Glu Asn Gin Val Phe Pro Pro Gly Arg Asn Cys Ser He 

115 120 125 

Ala Gly Trp Gly Thr Val Val Tyr Gin Gly Thr Thr Ala Asn He Leu 

130 135 140 

Gin Glu Ala Asp Val Pro Leu Leu Ser Asn Glu Arg Cys Gin Gin Gin 
145 150 155 160 

Met Pro Glu Tyr Asn He Thr Glu Asn Met He Cys Ala Gly Tyr Glu 

165 170 175 

Glu Gly Gly He Asp Ser Cys Gin Gly Asp Ser Gly Gly Pro Leu Met 

180 185 190 

Cys Gin Glu Asn Asn Arg Trp Phe Leu Ala Gly Val Thr Ser Phe Gly 

195 200 205 

Tyr Lys Cys Ala Leu Pro Asn Arg Pro Gly Val Tyr Ala Arg Val Ser 

210 215 220 

Arg Phe Thr Glu Trp He Gin Ser Phe Leu His 
225 230 235 



<210> 
<211> 
<212> 



7 

275 
PRT 
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<213> Bacillus subtilis 



<400> 7 

He Ala His Glu 

1 

Lys Ala Pro Ala 

20 

Val Ala Val He 
35 

Val Arg Gly Gly 
50 

Asp Gly Ser Ser 
65 

Asn Asn Ser He 

Ala Val Lys Val 

100 

He Asn Gly He 
115 

Met Ser Leu Gly 
130 

Asp Lys Ala Val 
145 

Glu Gly Ser Ser 

Pro Ser Thr He 

180 

Ser Phe Ser Ser 
195 

Ser He Gin Ser 
210 

Thr Ser Met Ala 
225 

Ser Lys His Pro 

Ser Thr Ala Thr 

260 

He Asn Val 
275 



Tyr Ala Gin Ser 

5 

Leu His Ser Gin 

Asp Ser Gly He 

40 

Ala Ser Phe Val 
55 

His Gly Thr His 
70 

Gly Val Leu Gly 
85 

Leu Asp Ser Thr 

Glu Trp Ala He 

120 

Gly Pro Thr Gly 
135 

Ser Ser Gly He 
150 

Gly Ser Thr Ser 
165 

Ala Val Gly Ala 

Ala Gly Ser Glu 

200 

Thr Leu Pro Gly 
215 

Thr Pro His Val 
230 

Thr Trp Thr Asn 
245 

Tyr Leu Gly Asn 



Val Pro Tyr Gly 
10 

Gly Tyr Thr Gly 

25 

Asp Ser Ser His 

Pro Ser Glu Thr 

60 

Val Ala Gly Thr 
75 

Val Ser Pro Ser 
90 

Gly Ser Gly Gin 
105 

Ser Asn Asn Met 

Ser Thr Ala Leu 

140 

Val Val Ala Ala 
155 

Thr Val Gly Tyr 
170 

Val Asn Ser Ser 
185 

Leu Asp Val Met 

Gly Thr Tyr Gly 

220 

Ala Gly Ala Ala 
235 

Ala Gin Val Arg 

250 

Ser Phe Tyr Tyr 
265 



He Ser Gin He 
15 

Ser Asn Val Lys 
30 

Pro Asp Leu Asn 
45 

Asn Pro Tyr Gin 

He Ala Ala Leu 

80 

Ala Ser Leu Tyr 
95 

Tyr Ser Trp He 
110 

Asp Val He Asn 
125 

Lys Thr Val Val 

Ala Ala Gly Asn 

160 

Pro Ala Lys Tyr 
175 

Asn Gin Arg Ala 
190 

Ala Pro Gly Val 
205 

Ala Tyr Asn Gly 

Ala Leu He Leu 

240 

Asp Arg Leu Glu 
255 

Gly Lys Gly Leu 
270 



<210> 8 
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<211> 320 
<212> PRT 

<213> Murinae gen. sp. 



<400> 8 

Val Ala Lys Arg 

1 

Pro Lys Phe Pro 

20 

Leu Asn Val Lys 

35 

Val Val Ser He 
50 

Ala Gly Asn Tyr 
65 

Pro Asp Pro Gin 

Thr Arg Cys Ala 

100 

Gly val Gly Val 
115 

Asp Gly Glu Val 
130 

Pro Asn His He 
145 

Gly Lys Thr Val 

Arg Gly Val Ser 

180 

Ala Ser Gly Asn 
195 

Tyr Thr Asn Ser 
210 

Gly Asn Val Pro 
225 

Thr Tyr Ser Ser 

Leu Arg Gin Lys 

260 

Pro Leu Ala Ala 
275 

Leu Thr Trp Arg 



Arg Ala Lys Arg 
5 

Gin Gin Trp Tyr 

Glu Ala Trp Ala 

40 

Leu Asp Asp Gly 

55 

Asp Pro Gly Ala 
70 

Pro Arg Tyr Thr 
85 

Gly Glu Val Ala 

Ala Tyr Asn Ala 

120 

Thr Asp Ala Val 
135 

His He Tyr Ser 
150 

Asp Gly Pro Ala 
165 

Gin Gly Arg Gly 

Gly Gly Arg Glu 

200 

He Tyr Thr Leu 
215 

Trp Tyr Ser Glu 
230 

Gly Asn Gin Asn 
245 

Cys Thr Glu Ser 

Gly He He Ala 

280 

Asp Met Gin His 



Asp Val Tyr Gin 
10 

Leu Ser Gly Val 
25 

Gin Gly Phe Thr 

He Glu Lys Asn 

60 

Ser Phe Asp Val 
75 

Gin Met Asn Asp 
90 

Ala Val Ala Asn 
105 

Arg He Gly Gly 

Glu Ala Arg Ser 

140 

Ala Ser Trp Gly 
155 

Arg Leu Ala Glu 
170 

Gly Leu Gly Ser 
185 

His Asp Ser Cys 

Ser He Ser Ser 

220 

Ala Cys Ser Ser 
235 

Glu Lys Gin He 
250 

His Thr Gly Thr 
265 

Leu Thr Leu Glu 
Leu Val Val Gin 



Glu Pro Thr Asp 
15 

Thr Gin Arg Asp 
30 

Gly His Gly He 
45 

His Pro Asp Leu 

Asn Asp Gin Asp 

80 

Asn Arg His Gly 
95 

Asn Gly Val Cys 
110 

Val Arg Met Leu 
125 

Leu Gly Leu Asn 

Pro Glu Asp Asp 

160 

Glu Ala Phe Phe 
175 

He Phe Val Trp 
190 

Asn Cys Asp Gly 
205 

Ala Thr Gin Phe 

Thr Leu Ala Thr 

240 

Val Thr Thr Asp 

255 

Ser Ala Ser Ala 
270 

Ala Asn Lys Asn 
285 

Thr Ser Lys Pro 
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290 295 300 

Ala His Leu Asn Ala Asp Asp Trp Ala Thr Asn Gly Val Gly Arg Lys 
305 310 315 320 



<210> 9 

<211> 330 

<212> PRT 

<213> Homo sapiens 

<400> 9 

Glu Lys Glu Arg Ser Lys Arg Ser Ala Leu Arg Asp Ser Ala Leu Asn 

15 10 15 

Leu Phe Asn Asp Pro Met Trp Asn Gin Gin Trp Tyr Leu Gin Asp Thr 

20 25 30 

Arg Met Thr Ala Ala Leu Pro Lys Leu Asp Leu His Val lie Pro Val 

35 40 45 

Trp Gin Lys Gly He Thr Gly Lys Gly Val Val He Thr Val Leu Asp 

50 55 60 

Asp Gly Leu Glu Trp Asn His Thr Asp He Tyr Ala Asn Tyr Asp Pro 
65 70 75 80 

Glu Ala Ser Tyr Asp Phe Asn Asp Asn Asp His Asp Pro Phe Pro Arg 

85 90 95 

Tyr Asp Pro Thr Asn Glu Asn Lys His Gly Thr Arg Cys Ala Gly Glu 

100 105 110 

He Ala Met Gin Ala Asn Asn His Lys Cys Gly Val Gly Val Ala Tyr 

115 120 125 

Asn Ser Lys Val Gly Gly He Arg Met Leu Asp Gly He Val Thr Asp 

130 135 140 

Ala He Glu Ala Ser Ser lie Gly Phe Asn Pro Gly His Val Asp He 
145 150 155 160 

Tyr Ser Ala Ser Trp Gly Pro Asn Asp Asp Gly Lys Thr Val Glu Gly 

165 170 175 

Pro Gly Arg Leu Ala Gin Lys Ala Phe Glu Tyr Gly Val Lys Gin Gly 

180 185 190 

Arg Gin Gly Lys Gly Ser He Phe Val Trp Ala Ser Gly Asn Gly Gly 

195 200 205 

Arg Gin Gly Asp Asn Cys Asp Cys Asp Gly Tyr Thr Asp Ser He Tyr 

210 215 220 

Thr He Ser He Ser Ser Ala Ser Gin Gin Gly Leu Ser Pro Trp Tyr 
225 230 235 240 

Ala Glu Lys Cys Ser Ser Thr Leu Ala Thr Ser Tyr Ser Ser Gly Asp 
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245 

Tyr Thr Asp Gin Arg 

260 

Glu Thr His Thr Gly 
275 

Phe Ala Leu Ala Leu 
290 

Gin His Leu Val Val 
305 

Pro Gly Trp Lys Lys 

325 



250 

lie Thr Ser Ala Asp 

265 

Thr Ser Ala Ser Ala 
280 

Glu Ala Asn Pro Asn 
295 

Trp Thr Ser Glu Tyr 
310 

Asn Gly Ala Gly Leu 

330 



255 

Leu His Asn Asp Cys Thr 

270 

Pro Leu Ala Ala Gly lie 
285 

Leu Thr Trp Arg Asp Met 
300 

Asp Pro Leu Ala Asn Asn 
315 320 



<210> 10 

<211> 297 

<212> PRT 

<213> Homo sapiens 

<400> 10 

Asn Thr His Pro Cys Gin Ser Asp Met Asn lie Glu Gly Ala Trp Lys 

15 10 15 

Arg Gly Tyr Thr Gly Lys Asn lie Val Val Thr lie Leu Asp Asp Gly 

20 25 30 

He Glu Arg Thr His Pro Asp Leu Met Gin Asn Tyr Asp Ala Leu Ala 

35 40 45 

Ser Cys Asp Val Asn Gly Asn Asp Leu Asp Pro Met Pro Arg Tyr Asp 

50 55 60 

Ala Ser Asn Glu Asn Lys His Gly Thr Arg Cys Ala Gly Glu Val Ala 
65 70 75 80 

Ala Ala Ala Asn Asn Ser His Cys Thr Val Gly lie Ala Phe Asn Ala 

85 90 95 

Lys He Gly Gly Val Arg Met Leu Asp Gly Asp Val Thr Asp Met Val 

100 105 110 

Glu Ala Lys Ser Val Ser Phe Asn Pro Gin His Val His He Tyr Ser 

115 120 125 

Ala Ser Trp Gly Pro Asp Asp Asp Gly Lys Thr Val Asp Gly Pro Ala 

130 135 140 

Pro Leu Thr Arg Gin Ala Phe Glu Asn Gly Val Arg Met Gly Arg Arg 
145 150 155 160 

Gly Leu Gly Ser Val Phe Val Trp Ala Ser Gly Asn Gly Gly Arg Ser 

165 170 175 

Lys Asp His Cys Ser Cys Asp Gly Tyr Thr Asn Ser He Tyr Thr He 
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180 

Ser lie Ser Ser 
195 

Glu Cys Ser Ser 
210 

Asp Lys Lys lie 

225 

His Thr Gly Thr 

Leu Ala Leu Glu 

260 

Val He Val Arg 
275 

Lys Thr Asn Ala 
290 



Thr Ala Glu Ser 

200 

Thr Leu Ala Thr 
215 

He Thr Thr Asp 
230 

Ser Ala Ser Ala 
245 

Ala Asn Pro Phe 

Thr Ser Arg Ala 

280 

Ala Gly Phe Lys 
295 



185 

Gly Lys Lys Pro 

Thr Tyr Ser Ser 

220 

Leu Arg Gin Arg 

235 

Pro Met Ala Ala 
250 

Leu Thr Trp Arg 
265 

Gly His Leu Asn 
Val 



190 

Trp Tyr Leu Glu 
205 

Gly Glu Ser Tyr 

Cys Thr Asp Asn 

240 

Gly He He Ala 

255 

Asp Val Gin His 
270 

Ala Asn Asp Trp 
285 



<210> 11 

<211> 328 

<212> PRT 

<213> Homo sapiens 

<400> 11 

Thr Leu Val Asp Glu Gin Pro Leu Glu Asn Tyr Leu Asp Met Glu Tyr 

15 10 15 

Phe Gly Thr He Gly He Gly Thr Pro Ala Gin Asp Phe Thr Val Val 

20 25 30 

Phe Asp Thr Gly Ser Ser Asn Leu Trp Val Pro Ser Val Tyr Cys Ser 

35 40 45 

Ser Leu Ala Cys Thr Asn His Asn Arg Phe Asn Pro Glu Asp Ser Ser 

50 55 60 

Thr Tyr Gin Ser Thr Ser Glu Thr Val Ser He Thr Tyr Gly Thr Gly 
65 70 75 80 

Ser Met Thr Gly He Leu Gly Tyr Asp Thr Val Gin Val Gly Gly He 

85 90 95 

Ser Asp Thr Asn Gin He Phe Gly Leu Ser Glu Thr Glu Pro Gly Ser 

100 105 110 

Phe Leu Tyr Tyr Ala Pro Phe Asp Gly He Leu Gly Leu Ala Tyr Pro 

115 120 125 

Ser He Ser Ser Ser Gly Ala Thr Pro Val Phe Asp Asn He Trp Asn 

130 135 140 

Gin Gly Leu Val Ser Gin Asp Leu Phe Ser Val Tyr Leu Ser Ala Asp 
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145 

Asp Lys Ser Gly 

Tyr Thr Gly Ser 

180 

Gin He Thr Val 
195 

Ala Glu Gly Cys 
210 

Gly Pro Thr Ser 
225 

Glu Asn Ser Asp 

Leu Pro Asp lie 

260 

Pro Ser Ala Tyr 
275 

Gin Gly Met Asn 
290 

Asp Val Phe He 
305 

Gin Val Gly Leu 



150 

Ser Val Val He 
165 

Leu Asn Trp Val 

Asp Ser He Thr 

200 

Gin Ala He Val 
215 

Pro He Ala Asn 
230 

Gly Asp Met Val 
245 

Val Phe Thr He 

He Leu Gin Ser 

260 

Val Pro Thr Glu 
295 

Arg Gin Tyr Phe 
310 

Ala Pro Val Ala 
325 



12 

155 

Phe Gly Gly He 
170 

Pro Val Thr Val 
185 

Met Asn Gly Glu 

Asp Thr Gly Thr 

220 

He Gin Ser Asp 
235 

Val Ser Cys Ser 
250 

Asn Gly Val Gin 
265 

Glu Gly Ser Cys 

Ser Gly Glu Leu 

300 

Thr Val Phe Asp 
315 



160 

Asp Ser Ser Tyr 
175 

Glu Gly Tyr Trp 
190 

Thr He Ala Cys 
205 

Ser Leu Leu Thr 

He Gly Ala Ser 

240 

Ala He Ser Ser 
255 

Tyr Pro Val Pro 

.270 

He Ser Gly Phe 
285 

Trp He Leu Gly 

Arg Ala Asn Asn 

320 



<210> 12 

<211> 358 

<212> PRT 

<213> Homo sapiens 

<400> 12 

Glu Met Val Asp Asn Leu Arg Gly 
1 5 
Glu Met Thr Val Gly Ser Pro Pro 

20 

Thr Gly Ser Ser Asn Phe Ala Val 

35 40 
His Arg Tyr Tyr Gin Arg Gin Leu 

50 55 
Lys Gly Val Tyr Val Pro Tyr Thr 
65 70 
Gly Thr Asp Leu Val Ser He Pro 



Lys Ser Gly Gin Gly Tyr Tyr Val 

10 15 
Gin Thr Leu Asn He Leu Val Asp 
25 30 
Gly Ala Ala Pro His Pro Phe Leu 

45 

Ser Ser Thr Tyr Arg Asp Leu Arg 

60 

Gin Gly Lys Trp Glu Gly Glu Leu 

75 80 
His Gly Pro Asn Val Thr Val Arg 
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85 go 95 

Ala Asn lie Ala Ala He Thr Glu Ser Asp Lys Phe Phe He Asn Gly 

100 105 HO 

Ser Asn Trp Glu Gly He Leu Gly Leu Ala Tyr Ala Glu He Ala Arg 

115 120 125 

Pro Asp Asp Ser Leu Glu Pro Phe Phe Asp Ser Leu Val Lys Gin Thr 

130 135 140 

His Val Pro Asn Leu Phe Ser Leu Gin Leu Cys Gly Ala Gly Phe Pro 
145 150 15 5 160 

Leu Asn Gin Ser Glu Val Leu Ala Ser Val Gly Gly Ser Met He He 

165 170 175 

Gly Gly He Asp His Ser Leu Tyr Thr Gly Ser Leu Trp Tyr Thr Pro 

180 185 190 

He Arg Arg Glu Trp Tyr Tyr Glu Val He He Val Arg Val Glu He 

195 200 205 

Asn Gly Gin Asp Leu Lys Met Asp Cys Lys Glu Tyr Asn Tyr Asp Lys 

210 215 220 

Ser He Val Asp Ser Gly Thr Thr Asn Leu Arg Leu Pro Lys Lys Val 
225 230 235 240 

Phe Glu Ala Ala Val Lys Ser He Lys Ala Ala Ser Ser Thr Glu Lys 

245 250 255 

Phe Pro Asp Gly Phe Trp Leu Gly Glu Gin Leu Val Cys Trp Gin Ala 

260 265 270 

Gly Thr Thr Pro Trp Asn He Phe Pro Val He Ser Leu Tyr Leu Met 

275 280 285 

Gly Glu Val Thr Asn Gin Ser Phe Arg He Thr He Leu Pro Gin Gin 

290 295 300 

Tyr Leu Arg Pro Val Glu Asp Val Ala Thr Ser Gin Asp Asp Cys Tyr 
305 310 315 320 

Lys Phe Ala He Ser Gin Ser Ser Thr Gly Thr Val Met Gly Ala Val 

325 330 335 

He Met Glu Gly Phe Tyr Val Val Phe Asp Arg Ala Arg Lys Arg He 

340 345 350 

Gly Phe Ala Val Ser Ala 
355 



<210> 13 

<211> 351 

<212> PRT 

<213> Homo sapiens 
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<400> 13 

Pro Ala Val Thr 

1 

Asp Ala Gin Tyr 

20 

Phe Thr Val Val 
35 

lie His Cys Lys 
50 

Asn Ser Asp Lys 
65 

lie His Tyr Gly 

Val Ser Val Pro 

100 

Val Lys Val Glu 
115 

He Thr Phe He 
130 

Pro Arg He Ser 
145 

Gin Gin Lys Leu 

Asp Pro Asp Ala 

180 

Ser Lys Tyr Tyr 
195 

Ala Tyr Trp Gin 
210 

Thr Leu Cys Lys 

225 

Leu Met Val Gly 

Gly Ala Val Pro 

2 60 

Val Ser Thr Leu 
275 

Lys Leu Ser Pro 
290 

Thr Leu Cys Leu 
305 

Gly Pro Leu Trp 



Glu Gly Pro He 
5 

Tyr Gly Glu He 

Phe Asp Thr Gly 

40 

Leu Leu Asp He 
55 

Ser Ser Thr Tyr 
70 

Ser Gly Ser Leu 
85 

Cys Gin Ser Ala 

Arg Gin Val Phe 

120 

Ala Ala Lys Phe 
135 

Val Asn Asn Val 
150 

Val Asp Gin Asn 
165 

Gin Pro Gly Gly 

Lys Gly Ser Leu 

200 

Val His Leu Asp 
215 

Glu Gly Cys Glu 
230 

Pro Val Asp Glu 
245 

Leu He Gin Gly 

Pro Ala He Thr 

280 

Glu Asp Tyr Thr 
295 

Ser Gly Phe Met 
310 

He Leu Gly Asp 



Pro Glu Val Leu 
10 

Gly He Gly Thr 
25 

Ser Ser Asn Leu 

Ala Cys Trp He 

60 

Val Lys Asn Gly 
75 

Ser Gly Tyr Leu 
90 

Ser Ser Ala Ser 
105 

Gly Glu Ala Thr 

Asp Gly He Leu 

140 

Leu Pro Val Phe 
155 

He Phe Ser Phe 
170 

Glu Leu Met Leu 
185 

Ser Tyr Leu Asn 

Gin Val Glu Val 

220 

Ala He Val Asp 
235 

Val Arg Glu Leu 
250 

Glu Tyr Met He 
265 

Leu Lys Leu Gly 

Leu Lys Val Ser 

300 

Gly Met Asp He 
315 

Val Phe He Gly 



Lys Asn Tyr Met 
15 

Pro Pro Gin Cys 
30 

Trp Val Pro Ser 
45 

His His Lys Tyr 

Thr Ser Phe Asp 

80 

Ser Gin Asp Thr 
95 

Ala Leu Gly Gly 
110 

Lys Gin Pro Gly 
125 

Gly Met Ala Tyr 

Asp Asn Leu Met 

160 

Tyr Leu Ser Arg 
175 

Gly Gly Thr Asp 
190 

Val Thr Arg Lys 
205 

Ala Ser Gly Leu 

Thr Gly Thr Ser 

240 

Gin Lys Ala He 
255 

Pro Cys Glu Lys 
270 

Gly Lys Gly Tyr 
285 

Gin Ala Gly Lys 

Pro Pro Pro Ser 

320 

Arg Tyr Tyr Thr 
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325 330 335 

Val Phe Asp Arg Asp Asn Asn Arg Val Gly Phe Ala Glu Ala Ala 

340 345 350 



<210> 14 

<211> 305 

<212> PRT 

<213> Homo sapiens 

<400> 14 

Met Leu Glu Ala Asp Asp Gin Gly Cys lie Glu Glu Gin Gly Val Glu 

15 10 15 

Asp Ser Ala Asn Glu Asp Ser Val Asp Ala Lys Pro Asp Arg Ser Ser 

20 25 30 

Phe Val Pro Ser Leu Phe Ser Lys Lys Lys Lys Asn Val Thr Met Arg 

35 40 45 

Ser lie Lys Thr Thr Arg Asp Arg Val Pro Thr Tyr Gin Tyr Asn Met 

50 55 60 

Asn Phe Glu Lys Leu Gly Lys Cys lie lie He Asn Asn Lys Asn Phe 
65 70 75 80 

Asp Lys Val Thr Gly Met Gly Val Arg Asn Gly Thr Asp Lys Asp Ala 

85 90 95 

Glu Ala Leu Phe Lys Cys Phe Arg Ser Leu Gly Phe Asp Val He Val 

100 105 110 

Tyr Asn Asp Cys Ser Cys Ala Lys Met Gin Asp Leu Leu Lys Lys Ala 

115 120 125 

Ser Glu Glu Asp His Thr Asn Ala Ala Cys Phe Ala Cys He Leu Leu 

130 135 140 

Ser His Gly Glu Glu Asn Val He Tyr Gly Lys Asp Gly Val Thr Pro 
145 150 155 160 

He Lys Asp Leu Thr Ala His Phe Arg Gly Asp Arg Ser Lys Thr Leu 

165 170 175 

Leu Glu Lys Pro Lys Leu Phe Phe He Gin Ala Cys Arg Gly Thr Glu 

180 185 190 

Leu Asp Asp Gly He Gin Ala Asp Ser Gly Pro He Asn Asp Thr Asp 

195 200 205 

Ala Asn Pro Arg Tyr Lys He Pro Val Glu Ala Asp Phe Leu Phe Ala 

210 215 220 

Tyr Ser Thr Val Pro Gly Tyr Tyr Ser Trp Arg Ser Pro Gly Arg Gly 
225 230 235 240 

Ser Trp Phe Val Gin Ala Leu Cys Ser He Leu Glu Glu His Gly Lys 



WO 2004/113521 

245 

Asp Leu Glu He Met Gin He 

260 

Arg His Phe Glu Ser Gin Ser 
275 

Gin He Pro Cys Val Val Ser 
290 295 

Gin 
305 



16 

250 

Leu Thr Arg Val Asn Asp 
265 

Asp Asp Pro His Phe His 
280 285 
Met Leu Thr Lys Glu Leu 

300 



PCT/EP2004/051 1 72 

255 
Arg Val Ala 
270 

Glu Lys Lys 
Tyr Phe Ser 



<210> 15 

<211> 262 

<212> PRT 

<213> Streptomyces sp. K15 



<400> 15 

Val Thr Lys Pro 

1 

Gly Thr Gly Thr 

20 

Thr Gly Ser Thr 
35 

Ser Asn Leu Asn 
50 

Asp Tyr Val Val 
65 

Asp Lys Val Thr 

Gly Cys Asp Ala 

100 

Thr Arg Ala Ala 
115 

Ala Thr Asn Leu 
130 

He Gly Asn Gly 
145 

Ala Ser Ser Ala 

Lys Ala Tyr Thr 

180 

Met Asp Thr Trp 



Thr He Ala Ala 

5 

Thr Leu Tyr Thr 

Thr Lys He Met 

40 

Leu Asp Ala Lys 
55 

Ala Asn Asn Ala 
70 

Val Arg Gin Leu 
85 

Ala Tyr Ala Leu 

Arg Val Lys Ser 

120 

Gly Leu His Asn 
135 

Ala Asn Tyr Ser 
150 

Met Lys Asn Ser 
165 

Ala Lys Thr Val 
Lys Asn Thr Asn 



Val Gly Gly Tyr 
10 

Lys Ala Ala Asp 
25 

Thr Ala Lys Val 

Val Thr He Gin 

60 

Ser Gin Ala His 
75 

Leu Tyr Gly Leu 
90 

Ala Asp Lys Tyr 
105 

Phe lie Gly Lys 

Thr His Phe Asp 

140 

Thr Pro Arg Asp 
155 

Thr Phe Arg Thr 
170 

Thr Lys Thr Gly 
185 

Gly Leu Leu Ser 



Ala Met Asn Asn 
15 

Thr Arg Arg Ser 
30 

Val Leu Ala Gin 
45 

Lys Ala Tyr Ser 

Leu He val Gly 

80 

Met Leu Pro Ser 
95 

Gly Ser Gly Ser 
110 

Met Asn Thr Ala 
125 

Ser Phe Asp Gly 

Leu Thr Lys He 

160 

Val Val Lys Thr 
175 

Ser He Arg Thr 
190 

Ser Tyr Ser Gly 
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195 

Ala He Gly Val 
210 

Val Phe Ala Ala 
225 

Ala Ser Thr Ser 

Asn Tyr Gly Phe 

260 



200 

Lys Thr Gly Ser 
215 

Thr Arg Gly Gly 

230 

He Pro Ala Arg 
245 

Ala Leu 



Gly Pro Glu Ala 

220 

Lys Thr Val He 
235 

Glu Ser Asp Ala 
250 



205 

Lys Tyr Cys Leu 

Gly Thr Val Leu 

240 

Thr Lys lie Met 

255 



<210> 16 

<211> 256 

<212> PRT 

<213> Human cytomegalovirus 



<400> 16 

Met Thr Met Asp 

1 

Gly Gly Phe Leu 

20 

Leu Leu Pro Arg 

35 

Gly Gin Pro Ser 
50 

Asp Thr Ala Val 
65 

Gly Leu Phe Cys 

Val Arg Arg Ala 

100 

Ser Pro Leu Gin 
115 

Ala Gly Leu Ser 
130 

Thr Ser Leu Ser 
145 

Cys Ser Val Gly 

Pro Glu Trp Val 

180 

Asp Gly Leu Arg 



Glu Gin Gin Ser 

5 

Ala Arg Tyr Asp 

Asp Val Val Glu 

40 

Leu Ser Val Ala 
55 

Val Gly His Val 
70 

Leu Gly Cys Val 
85 

Ser Glu Lys Ser 

Pro Asp Lys Val 

120 

Leu Ser Ser Arg 
135 

Gly Ser Glu Thr 
150 

Arg Arg Arg Gly 
165 

Thr Gin Arg Phe 
Ala Gin Trp Gin 



Gin Ala Val Ala 
10 

Gin Ser Pro Asp 
25 

His Trp Leu His 

Leu Pro Leu Asn 

60 

Ala Ala Met Gin 
75 

Thr Ser Pro Arg 
90 

Glu Leu Val Ser 
105 

Val Glu Phe Leu 

Arg Cys Asp Asp 

140 

Thr Pro Phe Lys 
155 

Thr Leu Ala Val 
170 

Pro Asp Leu Thr 
185 

Arg Cys Gly Ser 



Pro Val Tyr Val 
15 

Glu Ala Glu Leu 
30 

Ala Gin Gly Gin 
45 

He Asn His Asp 

Ser Val Arg Asp 

80 

Phe Leu Glu He 
95 

Arg Gly Pro Val 
110 

Ser Gly Ser Tyr 
125 

Val Glu Gin Ala 

His Val Ala Leu 

160 

Tyr Gly Arg Asp 
175 

Ala Ala Asp Arg 
190 

Thr Ala Val Asp 
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195 

Ala Ser Gly Asp 
210 

Ser Val Asp Ala 
225 

Asp Lys Gin Leu 



200 

Pro Phe Arg Ser 
215 

Leu Tyr He Arg 
230 

Val Gly Val Thr 
245 



18 

Asp Ser Tyr Gly 

220 

Glu Arg Leu Pro 
235 

Glu Arg Glu Ser 
250 



205 

Leu Leu Gly Asn 

Lys Leu Arg Tyr 

240 

Tyr Val Lys Ala 
255 



<210> 17 

<211> 248 

<212> PRT 

<213> Escherichia coli 



<400> 17 

Val Arg Ser Phe 

1 

Met Pro Thr Leu 

20 

Tyr Gly lie Lys 
35 

His Pro Lys Arg 
50 

Lys Leu Asp Tyr 
65 

Thr Tyr Asp Pro 

Ser Gly Gin Ala 

100 

Glu Pro Ser Asp 
115 

Ala Thr Ser Gly 
130 

Gly lie Arg Leu 
145 

Arg He Leu Thr 

Gin Gin Pro Gly 

180 

Tyr Phe Met Met 
195 

Trp Gly Phe Val 



He Tyr Glu Pro 

5 

Leu He Gly Asp 

Asp Pro He Tyr 

40 

Gly Asp He Val 

55 

He Lys Arg Ala 
70 

Val Ser Lys Glu 
85 

Cys Glu Asn Ala 

Phe Val Gin Thr 

120 

Phe Phe Glu Val 
135 

Ser Glu Arg Lys 
150 

Val Pro He Ala 
165 

Gin Gin Leu Ala 

Gly Asp Asn Arg 

200 

Pro Glu Ala Asn 



Phe Gin He Pro 
10 

Phe He Leu Val 
25 

Gin Lys Thr Leu 

Val Phe Lys Tyr 

60 

Val Gly Leu Pro 
75 

Leu Thr He Gin 
90 

Leu Pro Val Thr 
105 

Phe Ser Arg Arg 

Pro Lys Asn Glu 

140 

Glu Thr Leu Gly 
155 

Gin Asp Gin Val 
170 

Thr Trp He Val 
185 

Asp Asn Ser Ala 
Leu Val Gly Arg 



Ser Gly Ser Met 
15 

Glu Lys Phe Ala 
30 

He Glu Thr Gly 
45 

Pro Glu Asp Pro 

Gly Asp Lys Val 

80 

Pro Gly Cys Ser 
95 

Tyr Ser Asn Val 
110 

Asn Gly Gly Glu 
125 

Thr Lys Glu Asn 

Asp Val Thr His 

160 

Gly Met Tyr Tyr 
175 

Pro Pro Gly Gin 
190 

Asp Ser Arg Tyr 
205 

Ala Thr Ala He 
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19 

210 215 
Trp Met Ser Phe Asp Lys Gin Glu Gly Glu 
225 230 
Leu Ser Arg He Gly Gly He His 

245 
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220 

Trp Pro Thr Gly Leu Arg 
235 240 



<210> 18 

<2H> 317 

<212> PRT 

<213> Serratia marcescens 



<400> 18 

Met Glu Gin Leu 

1 

Gly Trp Leu Asp 

20 

Gly Asn Pro Asn 

35 

Gly Gly He Ser 
50 

Lys Val Leu Leu 
65 

Ala Ser Leu Asp 

Arg Leu Arg Glu 

100 

Ser Trp Gly Ser 
115 

Arg Val Ser Glu 
130 

Arg Leu His Trp 
145 

Lys Trp Glu Arg 

Val He Ala Ala 

180 

Gin Leu Glu Ala 
195 

Thr Leu Leu Pro 
210 

Ala Leu Ala Phe 



Arg Gly Leu Tyr 
5 

Thr Gly Asp Gly 

Gly Lys Pro Ala 

40 

Pro His His Arg 
55 

Phe Asp Gin Arg 
70 

Asn Asn Thr Thr 
85 

Met Ala Gly Val 

Thr Leu Ala Leu 

120 

Met Val Leu Arg 
135 

Tyr Tyr Gin Asp 
150 

Val Leu Ser He 
165 

Tyr Arg Gin Arg 

Ala Lys Leu Trp 

200 

Ser Arg Glu Ser 
215 

Ala Arg He Glu 



Pro Pro Leu Ala 
10 

His Arg He Tyr 
25 

Val Phe He His 

Gin Leu Phe Asp 

60 

Gly Cys Gly Arg 
75 

Trp His Leu Val 
90 

Glu Gin Trp Leu 
105 

Ala Tyr Ala Gin 

Gly He Phe Thr 

140 

Gly Ala Ser Arg 
155 

Leu Ser Asp Asp 
170 

Leu Thr Ser Ala 
185 

Ser Val Trp Glu 

Ala Ser Phe Gly 

220 

Asn His Tyr Phe 



Ala Tyr Asp Ser 
15 

Trp Glu Leu Ser 
30 

Gly Gly Pro Gly 
45 

Pro Glu Arg Tyr 

Ser Arg Pro His 

80 

Ala Asp He Glu 
95 

Val Phe Gly Gly 
110 

Thr His Pro Glu 
125 

Leu Arg Lys Gin 

Phe Phe Pro Glu 

160 

Glu Arg Lys Asp 
175 

Asp Pro Gin Val 
190 

Gly Glu Thr Val 
205 

Glu Asp Asp Phe 
Thr His Leu Gly 
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225 

Phe Leu Glu Ser 

His lie Pro Ala 

260 

Val Gin Asn Ala 
275 

His lie Val Glu 
290 

His Gin Leu Met 
305 



230 

Asp Asp Gin Leu 
245 

Val He Val His 

Trp Asp Leu Ala 

280 

Gly Ala Gly His 
295 

He Ala Thr Asp 
310 



20 

235 

Leu Arg Asn Val 
250 

Gly Arg Tyr Asp 
265 

Lys Ala Trp Pro 

Ser Tyr Asp Glu 

300 

Arg Phe Ala Gly 
315 



240 

Pro Leu He Arg 

255 

Met Ala Cys Gin 
270 

Glu Ala Glu Leu 
285 

Pro Gly lie Leu 
Lys 



<210> 19 
<211> 229 
<212> PRT 

<213> Escherichia coli 
<400> 19 

Met Glu Leu Leu Leu Leu Ser Asn Ser Thr Leu Pro Gly Lys Ala Trp 

15 10 15 

Leu Glu His Ala Leu Pro Leu He Ala Asn Gin Leu Asn Gly Arg Arg 

20 25 30 

Ser Ala Val Phe He Pro Phe Ala Gly Val Thr Gin Thr Trp Asp Glu 

35 40 45 

Tyr Thr Asp Lys Thr Ala Glu Val Leu Ala Pro Leu Gly Val Asn Val 

50 55 60 

Thr Gly He His Arg Val Ala Asp Pro Leu Ala Ala He Glu Lys Ala 
65 70 75 80 

Glu He He He Val Gly Gly Gly Asn Thr Phe Gin Leu Leu Lys Glu 

85 90 95 

Ser Arg Glu Arg Gly Leu Leu Ala Pro Met Ala Asp Arg Val Lys Arg 

100 105 110 

Gly Ala Leu Tyr He Gly Trp Ser Ala Gly Ala Asn Leu Ala Cys Pro 

115 120 125 

Thr He Arg Thr Thr Asn Asp Met Pro He Val Asp Pro Asn Gly Phe 

130 135 140 

Asp Ala Leu Asp Leu Phe Pro Leu Gin He Asn Pro His Phe Thr Asn 
145 150 155 160 

Ala Leu Pro Glu Gly His Lys Gly Glu Thr Arg Glu Gin Arg He Arg 

165 170 175 

Glu Leu Leu Val Val Ala Pro Glu Leu Thr Val He Gly Leu Pro Glu 
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180 

Gly Asn Trp lie Gin Val Ser Asn 
195 200 
Asn Thr Thr Trp Val Phe Lys Ala 

210 215 
Ala Gly His Arg Phe 
225 



185 190 
Gly Gin Ala Val Leu Gly Gly Pro 

205 

Gly Glu Glu Ala Val Ala Leu Glu 

220 



<210> 20 
<211> 99 
<212> PRT 

<213> Human immunodeficiency virus 
<400> 20 

Pro Gin lie Thr Leu Trp Gin Arg Pro Leu Val Thr Val Lys lie Gly 

15 10 15 

Gly Gin Leu Arg Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 

20 25 30 

Leu Glu Asp lie Asn Leu Pro Gly Lys Trp Lys Pro Lys Met lie Gly 

35 40 45 

Gly He Gly Gly Phe He Lys Val Arg Gin Tyr Asp. Gin He Leu He 

50 55 60 

Glu He Cys Gly Lys Lys Ala He Gly Thr Val Leu Val Gly Pro Thr 
65 70 75 80 

Pro Val Asn He He Gly Arg Asn Met Leu Thr Gin He Gly Cys Thr 

85 90 95 

Leu Asn Phe 



<210> 21 
<211> 297 
<212> PRT 

<213> Escherichia coli 
<400> 21 

Ser Thr Glu Thr Leu Ser Phe Thr 
1 5 
Ser Leu Gly Thr Leu Ser Gly Lys 

20 

Glu Glu Gly Gly Arg Lys Val Ser 



Pro Asp Asn He Asn Ala Asp lie 

10 15 
Thr Lys Glu Arg Val Tyr Leu Ala 
25 30 
Gin Leu Asp Trp Lys Phe Asn Asn 
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35 

Ala Ala He He 
50 

Ser He Gly Ala 
65 

Met Val Asp Gin 

Asp Glu Ala Arg 

100 

Asp Leu Asn He 
115 

Gly Leu Met Ala 
130 

Gly Gly Ser Tyr 
145 

Gly Ser Phe Pro 

Lys Met Pro Tyr 

180 

Glu Leu Gly Gly 
195 

Asn Asp Glu His 
210 

Val Lys Asp Gin 
225 

Val Thr Pro Asn 

Thr Asn Lys Lys 

260 

Ser Asp Tyr Ser 
275 

Thr Thr Ala Gly 
290 



40 

Lys Gly Ala He 
55 

Ala Gly Trp Thr 
70 

Asp Trp Met Asp 
85 

His Pro Asp Thr 

Lys Gly Trp Leu 

120 

Gly Tyr Gin Glu 
135 

He Tyr Ser Ser 
150 

Asn Gly Glu Arg 
165 

He Gly Leu Thr 

Thr Phe Lys Tyr 

200 

Tyr Asp Pro Lys 
215 

Asn Tyr Tyr Ser 
230 

Ala Lys Val Tyr 
245 

Gly Asn Thr Ser 

Lys Asn Gly Ala 

280 

Leu Lys Tyr Thr 
295 



Asn Trp Asp Leu 

60 

Thr Leu Gly Ser 
75 

Ser Ser Asn Pro 
90 

Gin Leu Asn Tyr 
105 

Leu Asn Glu Pro 

Ser Arg Tyr Ser 

140 

Glu Glu Gly Phe 
155 

Ala He Gly Tyr 
170 

Gly Ser Tyr Arg 
185 

Ser Gly Trp Val 

Gly Arg He Thr 

220 

Val Ala Val Asn 
235 

Val Glu Gly Ala 
250 

Leu Tyr Asp His 
265 

Gly He Glu Asn 
Phe 



45 

Met Pro Gin He 

Arg Gly Gly Asn 

80 

Gly Thr Trp Thr 

95 

Ala Asn Glu Phe 
110 

Asn Tyr Arg Leu 
125 

Phe Thr Ala Arg 

Arg Asp Asp He 

160 

Lys Gin Arg Phe 
175 

Tyr Glu Asp Phe 
190 

Glu Ser Ser Asp 
205 

Tyr Arg Ser Lys 

Ala Gly Tyr Tyr 

240 

Trp Asn Arg Val 
255 

Asn Asn Asn Thr 
270 

Tyr Asn Phe He 
285 



<210> 22 
<211> 212 
<212> PRT 

<213> Carica papaya 
<400> 22 

lie Pro Glu Tyr Val Asp Trp Arg Gin Lys Gly Ala Val Thr Pro Val 
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1-5 10 15 

Lys Asn Gin Gly Ser Cys Gly Ser Cys Trp Ala Phe Ser Ala Val Val 

20 25 30 

Thr He Glu Gly He He Lys He Arg Thr Gly Asn Leu Asn Gin Tyr 

35 40 45 

Ser Glu Gin Glu Leu Leu Asp Cys Asp Arg Arg Ser Tyr Gly Cys Asn 

50 55 60 

Gly Gly Tyr Pro Trp Ser Ala Leu Gin Leu Val Ala Gin Tyr Gly He 
65 70 75 80 

His Tyr Arg Asn Thr Tyr Pro Tyr Glu Gly Val Gin Arg Tyr Cys Arg 

85 90 95 

Ser Arg Glu Lys Gly Pro Tyr Ala Ala Lys Thr Asp Gly Val Arg Gin 

100 105 110 

Val Gin Pro Tyr Asn Gin Gly Ala Leu Leu Tyr Ser He Ala Asn Gin 

115 120 125 

Pro Val Ser Val Val Leu Gin Ala Ala Gly Lys Asp Phe Gin Leu Tyr 

130 135 140 

Arg Gly Gly He Phe Val Gly Pro Cys Gly Asn Lys Val Asp His Ala 
145 150 155 160 

Val Ala Ala Val Gly Tyr Gly Pro Asn Tyr He Leu He Lys Asn Ser 

165 170 175 

Trp Gly Thr Gly Trp Gly Glu Asn Gly Tyr He Arg He Lys Arg Gly 

180 185 190 

Thr Gly Asn Ser Tyr Gly Val Cys Gly Leu Tyr Thr Ser Ser Phe Tyr 

195 200 205 

Pro Val Lys Asn 
210 



<210> 23 

<211> 699 

<212> PRT 

<213> Homo sapiens 



<400> 23 

Ala Gly He Ala 

1 

Leu Gly Ser His 

20 

Ala Leu Arg Asn 
35 

Ser Phe Pro Ala 



Ala Lys Leu Ala 
5 

Glu Arg Ala He 

Glu Cys Leu Glu 

40 

He Pro Ser Ala 



Lys Asp Arg Glu 
10 

Lys Tyr Leu Asn 
25 

Ala Gly Thr Leu 
Leu Gly Phe Lys 



Ala Ala Glu Gly 
15 

Gin Asp Tyr Glu 
30 

Phe Gin Asp Pro 
45 

Glu Leu Gly Pro 
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50 55 60 

Tyr Ser Ser Lys Thr Arg Gly Met Arg Trp Lys Arg Pro Thr Glu He 
65 70 75 80 

Cys Ala Asp Pro Gin Phe He He Gly Gly Ala Thr Arg Thr Asp He 

85 90 95 

Cys Gin Gly Ala Leu Gly Asp Cys Trp Leu Leu Ala Ala He Ala Ser 

100 105 110 

Leu Thr Leu Asn Glu Glu He Leu Ala Arg Val Val Pro Leu Asn Gin 

115 120 125 

Ser Phe Gin Glu Asn Tyr Ala Gly He Phe His Phe Gin Phe Trp Gin 

130 135 140 

Tyr Gly Glu Trp Val Glu Val Val Val Asp Asp Arg Leu Pro Thr Lys 
145 150 155 160 

Asp Gly Glu Leu Leu Phe Val His Ser Ala Glu Gly Ser Glu Phe Trp 

165 170 175 

Ser Ala Leu Leu Glu Lys Ala Tyr Ala Lys He Asn Gly Cys Tyr Glu 

180 185 190 

Ala Leu Ser Gly Gly Ala Thr Thr Glu Gly Phe Glu Asp Phe Thr Gly 

195 200 205 

Gly He Ala Glu Trp Tyr Glu Leu Lys Lys Pro Pro Pro Asn Leu Phe 

210 215 220 

Lys He He Gin Lys Ala Leu Gin Lys Gly Ser Leu Leu Gly Cys Ser 
225 230 235 240 

He Asp He Thr Ser Ala Ala Asp Ser Glu Ala He Thr Phe Gin Lys 

245 250 255 

Leu Val Lys Gly His Ala Tyr Ser Val Thr Gly Ala Glu Glu Val Glu 

260 265 270 

Ser Asn Gly Ser Leu Gin Lys Leu lie Arg He Arg Asn Pro Trp Gly 

275 280 285 

Glu Val Glu Trp Thr Gly Arg Trp Asn Asp Asn Cys Pro Ser Trp Asn 

290 295 300 

Thr He Asp Pro Glu Glu Arg Glu Arg Leu Thr Arg Arg His Glu Asp 
305 310 315 320 

Gly Glu Phe Trp Met Ser Phe Ser Asp Phe Leu Arg His Tyr Ser Arg 

325 330 335 

Leu Glu lie Cys Asn Leu Thr Pro Asp Thr Leu Thr Ser Asp Thr Tyr 

340 345 350 

Lys Lys Trp Lys Leu Thr Lys Met Asp Gly Asn Trp Arg Arg Gly Ser 

355 360 365 

Thr Ala Gly Gly Cys Arg Asn Tyr Pro Asn Thr Phe Trp Met Asn Pro 

370 375 380 

Gin Tyr Leu He Lys Leu Glu Glu Glu Asp Glu Asp Glu Glu Asp Gly 
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385 390 395 400 

Glu Ser Gly Cys Thr Phe Leu Val Gly Leu lie Gin Lys His Arg Arg 

405 410 415 

* 

Arg Gin Arg Lys Met Gly Glu Asp Met His Thr lie Gly Phe Gly lie 

420 425 430 

Tyr Glu Val Pro Glu Glu Leu Ser Gly Gin Thr Asn lie His Leu Ser 

435 440 445 

Lys Asn Phe Phe Leu Thr Asn Arg Ala Arg Glu Arg Ser Asp Thr Phe 

450 455 460 

lie Asn Leu Arg Glu Val Leu Asn Arg Phe Lys Leu Pro Pro Gly Glu 
465 470 475 480 

Tyr lie Leu Val Pro Ser Thr Phe Glu Pro Asn Lys Asp Gly Asp Phe 

485 490 495 

Cys He Arg Val Phe Ser Glu Lys Lys Ala Asp Tyr Gin Ala Val Asp 

500 505 510 

Asp Glu He Glu Ala Asn Leu Glu Glu Phe Asp He Ser Glu Asp Asp 

515 520 525 

He Asp Asp Gly Val Arg Arg Leu Phe Ala Gin Leu Ala Gly Glu Asp 

530 535 540 

Ala Glu He Ser Ala Phe Glu Leu Gin Thr He Leu Arg Arg Val Leu 
545 550 555 560 

Ala Lys Arg Gin Asp He Lys Ser Asp Gly Phe Ser He Glu Thr Cys 

565 570 575 

Lys He Met Val Asp Met Leu Asp Ser Asp Gly Ser Gly Lys Leu Gly 

580 585 590 

Leu Lys Glu Phe Tyr He Leu Trp Thr Lys He Gin Lys Tyr Gin Lys 

595 600 605 

lie Tyr Arg Glu He Asp Val Asp Arg Ser Gly Thr Met Asn Ser Tyr 

610 615 620 

Glu Met Arg Lys Ala Leu Glu Glu Ala Gly Phe Lys Met Pro Cys Gin 
625 630 635 640 

Leu His Gin Val He Val Ala Arg Phe Ala Asp Asp Gin Leu He He 

645 650 655 

Asp Phe Asp Asn Phe Val Arg Cys Leu Val Arg Leu Glu Thr Leu Phe 

660 665 670 

Lys He Phe Lys Gin Leu Asp Pro Glu Asn Thr Gly Thr He Glu Leu 

675 680 685 

Asp Leu He Ser Trp Leu Cys Phe Ser Val Leu 
690 695 



<210> 24 
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<211> 221 
<212> PRT 

<213> Tobacco etch virus 



<400> 24 

Gly Glu Ser Leu Phe Lys Gly Pro Arg Asp Tyr Asn Pro lie Ser Ser 

15 10 15 

Thr lie Cys His Leu Thr Asn Glu Ser Asp Gly His Thr Thr Ser Leu 

20 25 30 

Tyr Gly lie Gly Phe Gly Pro Phe lie lie Thr Asn Ly3 Hi3 Leu Phe 

35 40 45 

Arg Arg Asn Asn Gly Thr Leu Leu Val Gin Ser Leu His Gly Val Phe 

50 55 60 

Lys Val Lys Asn Thr Thr Thr Leu Gin Gin His Leu lie Asp Gly Arg 
65 70 75 80 

Asp Met lie lie lie Arg Met Pro Lys Asp Phe Pro Pro Phe Pro Gin 

85 90 95 

Lys Leu Lys Phe Arg Glu Pro Gin Arg Glu Glu Arg He Cys Leu Val 

100 105 110 

Thr Thr Asn Phe Gin Thr Lys Ser Met Ser Ser Met Val Ser Asp Thr 

115 120 125 

Ser Cys Thr Phe Pro Ser Ser Asp Gly He Phe Trp Lys His Trp He 

130 135 140 

Gin Thr Lys Asp Gly Gin Cys Gly Ser Pro Leu Val Ser Thr Arg Asp 
145 150 155 160 

Gly Phe He Val Gly He His Ser Ala Ser Asn Phe Thr Asn Thr Asn 

165 170 175 

Asn Tyr Phe Thr Ser Val Pro Lys Asn Phe Met Glu Leu Leu Thr Asn 

180 185 190 

Gin Glu Ala Gin Gin Trp Val Ser Gly Trp Arg Leu Asn Ala Asp Ser 

195 200 205 

Val Leu Trp Gly Gly His Lys Val Phe Met Asp Lys Pro 
210 215 220 



<210> 25 

<2H> 371 

<212> PRT 

<213> Streptococcus pyogenes 



<400> 25 

Asp Gin Asn Phe Ala Arg Asn Glu Lys Glu Ala Lys Asp Ser Ala 
15 10 15 



He 
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Thr Phe He Gin Lys Ser Ala Ala He Lys Ala Gly Ala Arg Ser Ala 

20 25 30 

Glu Asp He Lys Leu Asp Lys Val Asn Leu Gly Gly Glu Leu Ser Gly 

35 40 45 

Ser Asn Met Tyr Val Tyr Asn He Ser Thr Gly Gly Phe Val He Val 

50 55 60 

Ser Gly Asp Lys Arg Ser Pro Glu He Leu Gly Tyr Ser Thr Ser Gly 
65 70 75 80 

Ser Phe Asp Val Asn Gly Lys Glu Asn He Ala Ser Phe Met Glu Ser 

85 90 95 

Tyr Val Glu Gin He Lys Glu Asn Lys Lys Leu Asp Ser Thr Tyr Ala 

100 105 no 

Gly Thr Ala Glu He Lys Gin Pro Val Val Lys Ser Leu Leu Asp Ser 

115 120 125 

Lys Gly He His Tyr Asn Gin Gly Asn Pro Tyr Asn Leu Leu Thr Pro 

130 135 140 

Val He Glu Lys Val Lys Pro Gly Glu Gin Ser Phe Val Gly Gin His 
145 150 155 160 

Ala Ala Thr Gly Ser Val Ala Thr Ala Thr Ala Gin He Met Lys Tyr 

165 170 175 

His Asn Tyr Pro Asn Lys Gly Leu Lys Asp Tyr Thr Tyr Thr Leu Ser 

180 185 190 

Ser Asn Asn Pro Tyr Phe Asn His Pro Lys Asn Leu Phe Ala Ala He 

195 200 205 

Ser Thr Arg Gin Tyr Asn Trp Asn Asn He Leu Pro Thr Tyr Ser Gly 

210 215 220 

Arg Glu Ser Asn Val Gin Lys Met Ala He Ser Glu Leu Met Ala Asp 
225 230 235 240 

Val Gly He Ser Val Asp Met Asp Tyr Gly Pro Ser Ser Gly Ser Ala 

245 250 255 

Gly Ser Ser Arg Val Gin Arg Ala Leu Lys Glu Asn Phe Gly Tyr Asn 

260 265 270 

Gin Ser Val His Gin He Asn Arg Gly Asp Phe Ser Lys Gin Asp Trp 

275 280 285 

Glu Ala Gin He Asp Lys Glu Leu Ser Gin Asn Gin Pro Val Tyr Tyr 

290 295 300 

Gin Gly Val Gly Lys Val Gly Gly His Ala Phe Val He Asp Gly Ala 
305 310 315 320 

Asp Gly Arg Asn Phe Tyr His Val Asn Trp Gly Trp Gly Gly Val Ser 

325 330 335 

Asp Gly Phe Phe Arg Leu Asp Ala Leu Asn Pro Ser Ala Leu Gly Thr 

340 345 350 
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Gly Gly Gly Ala Gly Gly Phe Asn Gly Tyr Gin Ser Ala Val Val Gly 

355 360 365 

lie Lys Pro 
370 



<210> 26 

<211> 353 

<212> PRT 

<213> Homo sapiens 



<400> 26 

Lys Lys His Thr 

1 

Tyr Met Asn Ser 

20 

Lys Ala Val Tyr 
35 

Val Pro Leu Ala 
50 

Lys Pro Val Gly 
65 

Leu Asp Ser Phe 

Leu Asp Asn Val 

100 

lie Pro Lys Leu 
115 

Glu Val Asp Tyr 
130 

Leu Ser lie Lys 
145 

Val Ala Val Glu 

His Gly Leu Gin 

180 

Pro Val Leu His 
195 

Asp Gin Asn lie 
210 

Pro Leu Asp Glu 
225 



Gly Tyr Val Gly 
5 

Leu Leu Gin Thr 

Met Met Pro Thr 

40 

Leu Gin Arg Val 

55 

Thr Lys Lys Leu 
70 

Met Gin His Asp 
85 

Glu Asn Lys Met 

Phe Arg Gly Lys 

120 

Arg Ser Asp Arg 
135 

Gly Lys Lys Asn 
150 

Gin Leu Asp Gly 
165 

Glu Ala Glu Lys 

Leu Gin Leu Met 

200 

Lys He Asn Asp 
215 

Phe Leu Gin Lys 
230 



Leu Lys Asn Gin 
10 

Leu Phe Phe Thr 
25 

Glu Gly Asp Asp 

Phe Tyr Glu Leu 

60 

Thr Lys Ser Phe 
75 

Val Gin Glu Leu 
90 

Lys Gly Thr Cys 
105 

Met Val Ser Tyr 

Arg Glu Asp Tyr 

140 

He Phe Glu Ser 
155 

Asp Asn Lys Tyr 
170 

Gly Val Lys Phe 
185 

Arg Phe Met Tyr 

Arg Phe Glu Phe 

220 

Thr Asp Pro Lys 
235 



Gly Ala Thr Cys 
15 

Asn Gin Leu Arg 
30 

Ser Ser Lys Ser 
45 

Gin His Ser Asp 

Gly Trp Glu Thr 

80 

Cys Arg Val Leu 
95 

Val Glu Gly Thr 
110 

He Gin Cys Lys 
125 

Tyr Asp He Gin 

Phe Val Asp Tyr 

160 

Asp Ala Gly Glu 
175 

Leu Thr Leu Pro 
190 

Asp Pro Gin Thr 
205 

Pro Glu Gin Leu 

Asp Pro Ala Asn 

240 
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Tyr He Leu His 

His Tyr Val Val 

260 

Phe Asp Asp Asp 
275 

His Asn Tyr Gly 
290 

Asn Ala Tyr Met 
305 

Leu Gin Ala Val 

Leu Gin Glu Glu 

340 

Glu 



Ala Val Leu Val 
245 

Tyr Leu Asn Pro 

Val Val Ser Arg 

280 

Gly His Asp Asp 
295 

Leu Val Tyr He 
310 

Thr Asp His Asp 
325 

Lys Arg He Glu 



29 

His Ser Gly Asp 
250 

Lys Gly Asp Gly 
265 

Cys Thr Lys Glu 

Asp Leu Ser Val 

300 

Arg Glu Ser Lys 
315 

He Pro Gin Gin 
330 

Ala Gin Lys Arg 
345 



Asn His Gly Gly 
255 

Lys Trp Cys Lys 
270 

Glu Ala He Glu 
285 

Arg His Cys Thr 

Leu Ser Glu Val 

320 

Leu Val Glu Arg 
335 

Lys Glu Arg Gin 
350 



<210> 27 

<211> 174 

<212> PRT 

<213> Staphylococcus aureus 



<400> 27 

Tyr Asn Glu Gin 

1 

Thr Gin Gly Asn 

20 

Leu Asn Ala Thr 

35 

Arg Phe Leu His 
50 

Leu Thr Pro Arg 
65 

Pro Gin Leu Leu 

Thr Lys Asn Asn 

100 

Arg Asn Gly Met 
115 

Lys Leu Asn Asn 
130 



Tyr Val Asn Lys 
5 

Asn Gly Trp Cys 

Tyr Asn Thr Asn 

40 

Pro Asn Leu Gin 
55 

Glu Met He Tyr 
70 

Asn Arg Met Thr 
85 

Lys Gly He Ala 

His Ala Gly His 

120 

Gly Gin Glu Val 
135 



Leu Glu Asn Phe 
10 

Ala Gly Tyr Thr 
25 

Lys Tyr His Ala 

Gly Gin Gin Phe 

60 

Phe Gly Gin Thr 
75 

Thr Tyr Asn Glu 
90 

He Leu Gly Ser 
105 

Ala Met Ala Val 

He He He Trp 

140 



Lys He Arg Glu 
15 

Met Ser Ala Leu 
30 

Glu Ala Val Met 
45 

Gin Phe Thr Gly 

Gin Gly Arg Ser 

80 

Val Asp Asn Leu 
95 

Arg Val Glu Ser 
110 

Val Gly Asn Ala 
125 

Asn Pro Trp Asp 
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Asn Gly Phe Met Thr 
145 

Asn Gly Asp His Tyr 

165 



Gin Asp Ala Lys Asn 
150 

Gin Trp Tyr Ser Ser 

170 



Asn Val lie Pro Val Ser 
155 160 
lie Tyr Gly Tyr 



<210> 28 
<211> 221 
<212> PRT 

<213> Saccharomyces cerevisiae 
<400> 28 

Gly Ser Leu Val Pro Glu Leu Asn Glu Lys Asp Asp Asp Gin Val Gin 

15 10 15 

Lys Ala Leu Ala Ser Arg Glu Asn Thr Gin Leu Met Asn Arg Asp Asn 

20 25 30 

lie Glu lie Thr Val Arg Asp Phe Lys Thr Leu Ala Pro Arg Arg Trp 

35 40 45 

Leu Asn Asp Thr lie He Glu Phe Phe Met Lys Tyr He Glu Lys Ser 

50 55 60 

Thr Pro Asn Thr Val Ala Phe Asn Ser Phe Phe Tyr Thr Asn Leu Ser 
65 70 75 80 

Glu Arg Gly Tyr Gin Gly Val Arg Arg Trp Met Lys Arg Lys Lys Thr 

85 90 95 

Gin He Asp Lys Leu Asp Lys He Phe Thr Pro He Asn Leu Asn Gin 

100 105 110 

Ser His Trp Ala Leu Gly He He Asp Leu Lys Lys Lys Thr He Gly 

115 120 125 

Tyr Val Asp Ser Leu Ser Asn Gly Pro Asn Ala Met Ser Phe Ala He 

130 135 140 

Leu Thr Asp Leu Gin Lys Tyr Val Met Glu Glu Ser Lys His Thr He 
145 150 155 160 

Gly Glu Asp Phe Asp Leu He His Leu Asp Cys Pro Gin Gin Pro Asn 

165 170 175 

Gly Tyr Asp Cys Gly He Tyr Val Cys Met Asn Thr Leu Tyr Gly Ser 

180 185 190 

Ala Asp Ala Pro Leu Asp Phe Asp Tyr Lys Asp Ala He Arg Met Arg 

195 200 205 

Arg Phe He Ala His Leu He Leu Thr Asp Ala Leu Lys 
210 215 220 
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<210> 29 

<211> 166 

<212> PRT 

<213> Pyrococcus horikoshii 



<400> 29 

Met Lys Val Leu 

1 

lie Tyr Pro Tyr 

20 

Ala Ser Phe Glu 
35 

Lys Val Asp Leu 
50 

Leu Val Leu Pro 
65 

Lys Ala Val Ser 

Ala Ser lie Cys 

100 

Arg Gly Arg Lys 
115 

Asn Ala Gly Val 
130 

Trp Val Ser Ser 
145 

Phe Val Lys Leu 



Phe Leu Thr Ala 

5 

His Arg Leu Lys 

Arg Gly Thr lie 

40 

Thr Phe Asp Lys 
55 

Gly Gly Arg Ala 
70 

lie Ala Arg Lys 
85 

His Gly Pro Gin 

Gly Thr Ser Tyr 

120 

Glu Trp Val Asp 
135 

Arg Val Pro Ala 

150 
Leu Lys 
165 



Asn Glu Phe Glu 
10 

Glu Glu Gly His 
25 

Thr Gly Lys His 

Val Asn Pro Glu 

60 

Pro Glu Arg Val 
75 

Met Phe Ser Glu 
90 

lie Leu lie Ser 
105 

Pro Gly lie Lys 

Ala Glu Val Val 

140 

Asp Leu Tyr Ala 
155 



Asp Val Glu Leu 
15 

Glu Val Tyr He 
30 

Gly Tyr Ser Val 
45 

Glu Phe Asp Ala 

Arg Leu Asn Glu 

80 

Gly Lys Pro Val 
95 

Ala Gly Val Leu 
110 

Asp Asp Met He 
125 

Val Asp Gly Asn 

Trp Met Arg Glu 

160 



<210> 30 
<211> 316 
<212> PRT 

<213> Bacillus thermoproteolyticus 
<400> 30 

He Thr Gly Thr Ser Thr Val Gly Val Gly Arg Gly Val Leu Gly Asp 

15 10 15 

Gin Lys Asn He Asn Thr Thr Tyr Ser Thr Tyr Tyr Tyr Leu Gin Asp 

20 25 30 

Asn Thr Arg Gly Asp Gly He Phe Thr Tyr Asp Ala Lys Tyr Arg Thr 
35 40 45 
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Thr Leu Pro Gly 
50 

Ser Tyr Asp Ala 
65 

Tyr Asp Tyr Tyr 

Asn Ala Ala lie 

100 

Ala Phe Trp Asn 
115 

Thr Phe lie Pro 
130 

Thr His Ala Val 
145 

Ser Gly Ala He 

Glu Phe Tyr Ala 

180 

Tyr Thr Pro Gly 
195 

Ala Lys Tyr Gly 
210 

Gin Asp Asn Gly 
225 

Ala Tyr Leu He 

Gly He Gly Arg 

260 

Gin Tyr Leu Thr 
275 

Val Gin Ser Ala 
290 

Ser Val Lys Gin 
305 



Ser Leu Trp Ala 
55 

Pro Ala Val Asp 
70 

Lys Asn Val His 
85 

Arg Ser Ser Val 

Gly Ser Glu Met 

120 

Leu Ser Gly Gly 
135 

Thr Asp Tyr Thr 
150 

Asn Glu Ala He 
165 

Asn Lys Asn Pro 

He Ser Gly Asp 

200 

Asp Pro Asp His 
215 

Gly Val His He 
230 

Ser Gin Gly Gly 
245 

Asp Lys Leu Gly 

Pro Thr Ser Asn 

280 

Thr Asp Leu Tyr 
295 

Ala Phe Asp Ala 
310 



32 

Asp Ala Asp Asn 

60 

Ala His Tyr Tyr 
75 

Asn Arg Leu Ser 
90 

His Tyr Ser Gin 
105 

Val Tyr Gly Asp 

He Asp Val Val 

140 

Ala Gly Leu He 
155 

Ser Asp He Phe 
170 

Asp Trp Glu He 
185 

Ser Leu Arg Ser 

Tyr Ser Lys Arg 

220 

Asn Ser Gly He 
235 

Thr His Tyr Gly 
250 

Lys He Phe Tyr 
265 

Phe Ser Gin Leu 

Gly Ser Thr Ser 

300 

Val Gly Val Lys 
315 



Gin Phe Phe Ala 

Ala Gly Val Thr 

80 

Tyr Asp Gly Asn 
95 

Gly Tyr Asn Asn 
110 

Gly Asp Gly Gin 
125 

Ala His Glu Leu 

Tyr Gin Asn Glu 

160 

Gly Thr Leu Val 
175 

Gly Glu Asp Val 
190 

Met Ser Asp Pro 
205 

Tyr Thr Gly Thr 

He Asn Lys Ala 

240 

Val Ser Val Val 
255 

Arg Ala Leu Thr 
270 

Arg Ala Ala Ala 
285 

Gin Glu Val Ala 



<210> 31 

<211> 169 

<212> PRT 

<213> Homo sapiens 



<400> 31 
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Val Leu Thr Glu 

1 

Arg He Glu Asn 

20 

Ala He Glu Lys 
35 

Phe Thr Lys Val 
50 

Arg Gly Asp His 
65 

Leu Ala His Ala 

Phe Asp Glu Asp 

100 

His Arg Val Ala 
115 

Ser Thr Asp He 
130 

Asp Val Gin Leu 
145 

Gly Arg Ser Gin 



Gly Asn Pro Arg 

5 

Tyr Thr Pro Asp 

Ala Phe Gin Leu 

40 

Ser Glu Gly Gin 
55 

Arg Asp Asn Ser 
70 

Phe Gin Pro Gly 
85 

Glu Arg Trp Thr 

Ala His Glu Leu 

120 

Gly Ala Leu Met 
135 

Ala Gin Asp Asp 
150 

Asn Pro Val Gin 
165 



33 

Trp Glu Gin Thr 
10 

Leu Pro Arg Ala 
25 

Trp Ser Asn Val 

Ala Asp He Met 

60 

Pro Phe Asp Gly 

75 

Pro Gly He Gly 
90 

Asn Asn Phe Arg 
105 

Gly His Ser Leu 

Tyr Pro Ser Tyr 

140 

He Asp Gly He 
155 

Pro 



His Leu Thr Tyr 
15 

Asp Val Asp His 
30 

Thr Pro Leu Thr 
45 

He Ser Phe Val 

Pro Gly Gly Asn 

80 

Gly Asp Ala His 
95 

Glu Tyr Asn Leu 
110 

Gly Leu Ser His 
125 

Thr Phe Ser Gly 

Gin Ala He Tyr 

160 



<210> 32 

<211> 496 

<212> PRT 

<213> Homo sapiens 

<400> 32 

Gin Tyr Ser Pro Asn Thr Gin Gin Gly Arg Thr Ser He Val His Leu 

15 10 15 

Phe Glu Trp Arg Trp Val Asp He Ala Leu Glu Cys Glu Arg Tyr Leu 

20 25 30 

Ala Pro Lys Gly Phe Gly Gly Val Gin Val Ser Pro Pro Asn Glu Asn 

35 40 45 

Val Ala He Tyr Asn Pro Phe Arg Pro Trp Trp Glu Arg Tyr Gin Pro 

50 55 60 

Val Ser Tyr Lys Leu Cys Thr Arg Ser Gly Asn Glu Asp Glu Phe Arg 
65 70 75 80 

Asn Met Val Thr Arg Cys Asn Asn Val Gly Val Arg He Tyr Val Asp 

85 90 95 
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Ala Val He Asn His Met Cys Gly Asn Ala Val Ser Ala Gly Thr Ser 

100 105 xiO 

Ser Thr Cys Gly Ser Tyr Phe Asn Pro Gly Ser Arg Asp Phe Pro Ala 

115 120 125 

Val Pro Tyr Ser Gly Trp Asp Phe Asn Asp Gly Lys Cys Lys Thr Gly 

130 135 140 

Ser Gly Asp He Glu Asn Tyr Asn Asp Ala Thr Gin Val Arg Asp Cys 
145 150 155 160 

Arg Leu Thr Gly Leu Leu Asp Leu Ala Leu Glu Lys Asp Tyr Val Arg 

165 170 175 

Ser Lys He Ala Glu Tyr Met Asn His Leu He Asp He Gly Val Ala 

180 185 190 

Gly Phe Arg Leu Asp Ala Ser Lys His Met Trp Pro Gly Asp He Lys 

195 200 205 

Ala He Leu Asp Lys Leu His Asn Leu Asn Ser Asn Trp Phe Pro Ala 

210 215 220 

Gly Ser Lys Pro Phe He Tyr Gin Glu Val He Asp Leu Gly Gly Glu 
225 230 235 240 

Pro He Lys Ser Ser Asp Tyr Phe Gly Asn Gly Arg Val Thr Glu Phe 

245 250 255 

Lys Tyr Gly Ala Lys Leu Gly Thr Val He Arg Lys Trp Asn Gly Glu 

260 265 270 

Lys Met Ser Tyr Leu Lys Asn Trp Gly Glu Gly Trp Gly Phe Val Pro 

275 280 285 

Ser Asp Arg Ala Leu Val Phe Val Asp Asn His Asp Asn Gin Arg Gly 

290 295 300 

His Gly Ala Gly Gly Ala Ser He Leu Thr Phe Trp Asp Ala Arg Leu 
305 310 315 320 

Tyr Lys Met Ala Val Gly Phe Met Leu Ala His Pro Tyr Gly Phe Thr 

325 330 335 

Arg Val Met Ser Ser Tyr Arg Trp Pro Arg Gin Phe Gin Asn Gly Asn 

340 345 350 

Asp Val Asn Asp Trp Val Gly Pro Pro Asn Asn Asn Gly Val He Lys 

355 360 365 

Glu Val Thr He Asn Pro Asp Thr Thr Cys Gly Asn Asp Trp Val Cys 

370 375 380 

Glu His Arg Trp Arg Gin He Arg Asn Met Val He Phe Arg Asn Val 
385 390 395 400 

Val Asp Gly Gin Pro Phe Thr Asn Trp Tyr Asp Asn Gly Ser Asn Gin 

405 410 415 

Val Ala Phe Gly Arg Gly Asn Arg Gly Phe He Val Phe Asn Asn Asp 

420 425 430 
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Asp Trp Ser Phe 
435 

Tyr Cys Asp Val 
450 

He Lys He Tyr 
465 

Asn Ser Ala Glu 



Ser Leu Thr Leu 

440 

He Ser Gly Asp 
455 

Val Ser Asp Asp 
470 

Asp Pro Phe He 
485 



35 

Gin Thr Gly Leu 

Lys He Asn Gly 

460 

Gly Lys Ala His 
475 

Ala He His Ala 
490 



Pro Ala Gly Thr 
445 

Asn Cys Thr Gly 

Phe Ser He Ser 

480 

Glu Ser Lys Leu 
495 



<210> 33 
<211> 370 
<212> PRT 

<213> Trichoderraa reesei 
<400> 33 

Gin Pro Gly Thr Ser Thr Pro Glu Val His Pro Lys Leu Thr Thr Tyr 

15 10 15 

Lys Cys Thr Lys Ser Gly Gly Cys Val Ala Gin Asp Thr Ser Val Val 

20 25 30 

Leu Asp Trp Asn Tyr Arg Trp Met His Asp Ala Asn Tyr Asn Ser Cys 

35 40 45 

Thr Val Asn Gly Gly Val Asn Thr Thr Leu Cys Pro Asp Glu Ala Thr 

50 55 60 

Cys Gly Lys Asn Cys Phe He Glu Gly Val Asp Tyr Ala Ala Ser Gly 
65 70 75 80 

Val Thr Thr Ser Gly Ser Ser Leu Thr Met Asn Gin Tyr Met Pro Ser 

85 90 95 

Ser Ser Gly Gly Tyr Ser Ser Val Ser Pro Arg Leu Tyr Leu Leu Asp 

100 105 110 

Ser Asp Gly Glu Tyr Val Met Leu Lys Leu Asn Gly Gin Glu Leu Ser 

115 120 125 

Phe Asp Val Asp Leu Ser Ala Leu Pro Cys Gly Glu Asn Gly Ser Leu 

130 135 140 

Tyr Leu Ser Gin Met Asp Glu Asn Gly Gly Ala Asn Gin Tyr Asn Thr 
145 150 155 160 

Ala Gly Ala Asn Tyr Gly Ser Gly Tyr Cys Asp Ala Gin Cys Pro Val 

165 170 175 

Gin Thr Trp Arg Asn Gly Thr Leu Asn Thr Ser His Gin Gly Phe Cys 

180 185 190 

Cys Asn Glu Met Asp He Leu Glu Gly Asn Ser Arg Ala Asn Ala Leu 
195 200 205 
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Thr Pro His Ser Cys Thr Ala Thr Ala Cys Asp Ser Ala Gly Cys Gly 

210 215 220 

Phe Asn Pro Tyr Gly Ser Gly Tyr Lys Ser Tyr Tyr Gly Pro Gly Asp 
225 230 235 240 

Thr Val Asp Thr Ser Lys Thr Phe Thr lie lie Thr Gin Phe Asn Thr 

245 250 255 

Asp Asn Gly Ser Pro Ser Gly Asn Leu Val Ser lie Thr Arg Lys Tyr 

260 265 270 

Gin Gin Asn Gly Val Asp lie Pro Ser Ala Gin Pro Gly Gly Asp Thr 

275 280 285 

lie Ser Ser Cys Pro Ser Ala Ser Ala Tyr Gly Gly Leu Ala Thr Met 

290 295 300 

Gly Lys Ala Leu Ser Ser Gly Met Val Leu Val Phe Ser lie Trp Asn 
305 310 315 320 

Asp Asn Ser Gin Tyr Met Asn Trp Leu Asp Ser Gly Asn Ala Gly Pro 

325 330 335 

Cys Ser Ser Thr Glu Gly Asn Pro Ser Asn lie Leu Ala Asn Asn Pro 

340 345 350 

Asn Thr His Val Val Phe Ser Asn lie Arg Trp Gly Asp lie Gly Ser 
355 360 365 

Thr Thr 
370 



<210> 34 
<211> 223 
<212> PRT 

<213> Aspergillus niger 
<400> 34 

Gin Thr Met Cys Ser Gin Tyr Asp Ser Ala Ser Ser Pro Pro Tyr Ser 

15 10 15 

Val Asn Gin Asn Leu Trp Gly Glu Tyr Gin Gly Thr Gly Ser Gin Cys 

20 25 30 

Val Tyr Val Asp Lys Leu Ser Ser Ser Gly Ala Ser Trp His Thr Glu 

35 40 45 

Trp Thr Trp Ser Gly Gly Glu Gly Thr Val Lys Ser Tyr Ser Asn Ser 

50 55 60 

Gly Val Thr Phe Asn Lys Lys Leu Val Ser Asp Val Ser Ser lie Pro 
65 70 75 80 

Thr Ser Val Glu Trp Lys Gin Asp Asn Thr Asn Val Asn Ala Asp Val 

85 90 95 
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Ala Tyr Asp Leu Phe Thr Ala Ala Asn Val Asp His Ala Thr Ser Ser 

100 105 110 

Gly Asp Tyr Glu Leu Met lie Trp Leu Ala Arg Tyr Gly Asn lie Gin 

115 120 125 

Pro He Gly Lys Gin He Ala Thr Ala Thr Val Gly Gly Lys Ser Trp 

130 135 140 

Glu Val Trp Tyr Gly Ser Thr Thr Gin Ala Gly Ala Glu Gin Arg Thr 
145 150 155 160 

Tyr Ser Phe Val Ser Glu Ser Pro He Asn Ser Tyr Ser Gly Asp He 

165 170 175 

Asn Ala Phe Phe Ser Tyr Leu Thr Gin Asn Gin Gly Phe Pro Ala Ser 

180 185 190 

Ser Gin Tyr Leu He Asn Leu Gin Phe Gly Thr Glu Ala Phe Thr Gly 

195 200 205 

Gly Pro Ala Thr Phe Thr Val Asp Asn Trp Thr Ala Ser Val Asn 
210 215 220 



<210> 35 

<211> 184 

<212> PRT 

<213> Aspergillus niger 



<400> 35 

Ser Ala Gly He 

1 

Phe Thr Tyr Asp 

20 

Gly Val Ser Ser 
35 

Ser Asn Ala He 
50 

Ser Tyr Leu Ala 
65 

Tyr He Val Glu 

Ser Leu Gly Thr 

100 

Asp Thr Arg Thr 
115 

Gin Tyr Phe Ser 
130 



Asn Tyr Val Gin 
5 

Glu Ser Ala Gly 

Asp Phe Val Val 

40 

Thr Tyr Ser Ala 
55 

Val Tyr Gly Trp 
70 

Asp Tyr Gly Asp 
85 

Val Tyr Ser Asp 

Asn Glu Pro Ser 

120 

Val Arg Glu Ser 
135 



Asn Tyr Asn Gly 
10 

Thr Phe Ser Met 
25 

Gly Leu Gly Trp 

Glu Tyr Ser Ala 

60 

Val Asn Tyr Pro 

75 

Tyr Asn Pro Cys 
90 

Gly Ser Thr Tyr 
105 

He Thr Gly Thr 

Thr Arg Thr Ser 

140 



Asn Leu Gly Asp 
15 

Tyr Trp Glu Asp 
30 

Thr Thr Gly Ser 
45 

Ser Gly Ser Ala 

Gin Ala Glu Tyr 

80 

Ser Ser Ala Thr 
95 

Gin Val Cys Thr 
110 

Ser Thr Phe Thr 
125 

Gly Thr Val Thr 
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Val Ala Asn His Phe Asn Phe Trp Ala His His Gly Phe Gly A sn S er 

1^5 150 tec 

lou 155 ifio 

Asp Phe Asn Tyr Gin Val Val Ala Val Glu Ala Trp Ser Gly Ala Gly 

170 175 

Ser Ala Ser Val Thr He Ser Ser 

180 



<210> 36 
<211> 313 
<212> PRT 

<213> Streptomyces lividans 
<400> 36 

Ala Glu Ser Thr Leu Gly Ala Ala Ala Ala Gin Ser Gly Arg Tyr Phe 

1 5 10 15 

Gly Thr Ala He Ala Ser Gly Arg Leu Ser Asp Ser Thr Tyr Thr Ser 

20 25 30 

lie Ala Gly Arg Glu Phe Asn Met Val Thr Ala Glu Asn Glu Met Lys 

35 40 45 

He Asp Ala Thr Glu Pro Gin Arg Gly Gin Phe Asn Phe Ser Ser Ala 

50 55 60 

Asp Arg Val Tyr Asn Trp Ala Val Gin Asn Gly Lys Gin Val Arg Gly 
65 70 75 80 

His Thr Leu Ala Trp His Ser Gin Gin Pro Gly Trp Met Gin Ser Leu 

85 90 95 

Ser Gly Ser Ala Leu Arg Gin Ala Met He Asp His He Asn Gly Val 

100 105 no 

Met Ala His Tyr Lys Gly Lys He Val Gin Trp Asp Val Val Asn Glu 

115 120 125 

Ala Phe Ala Asp Gly Ser Ser Gly Ala Arg Arg Asp Ser Asn Leu Gin 

I 30 135 140 

Arg Ser Gly Asn Asp Trp He Glu Val Ala Phe Arg Thr Ala Arg Ala 
145 I 50 155 160 

Ala Asp Pro Ser Ala Lys Leu Cys Tyr Asn Asp Tyr Asn Val Glu Asn 

165 170 175 

Trp Thr Trp Ala Lys Thr Gin Ala Met Tyr Asn Met Val Arg Asp Phe 

180 185 190 

Lys Gin Arg Gly Val Pro He Asp Cys Val Gly Phe Gin Ser His Phe 

195 200 205 

Asn Ser Gly Ser Pro Tyr Asn Ser Asn Phe Arg Thr Thr Leu Gin Asn 
21° 215 220 
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Phe Ala Ala Leu 
225 

Gly Ala Pro Ala 

Val Ser Arg Cys 

260 

Ser Trp Arg Ser 
275 

Lys Lys Ala Ala 
290 

Ser Ser Glu Pro 
305 



Gly Val Asp Val 
230 

Ser Thr Tyr Ala 
245 

Leu Gly lie Thr 

Glu Gin Thr Pro 

280 

Tyr Thr Ala Val 
295 

Pro Ala Asp Gly 
310 



39 

Ala He Thr Glu 
235 

Asn Val Thr Asn 
250 

Val Trp Gly Val 
265 

Leu Leu Phe Asn 

Leu Asp Ala Leu 

300 

Gly 



Leu Asp He Gin 

240 

Asp Cys Leu Ala 
255 

Arg Asp Ser Asp 
270 

Asn Asp Gly Ser 
285 

Asn Gly Gly Ala 



<210> 37 
<211> 362 
<212> PRT 

<213> Aspergillus niger 
<400> 37 

Met His Ser Phe Ala Ser Leu Leu Ala Tyr Gly Leu Val Ala Gly Ala 

15 10 15 

Thr Phe Ala Ser Ala Ser Pro lie Glu Ala Arg Asp Ser Cys Thr Phe 

20 25 30 

Thr Thr Ala Ala Ala Ala Lys Ala Gly Lys Ala Lys Cys Ser Thr lie 

35 40 45 

Thr Leu Asn Asn He Glu Val Pro Ala Gly Thr Thr Leu Asp Leu Thr 

50 55 60 

Gly Leu Thr Ser Gly Thr Lys Val He Phe Glu Gly Thr Thr Thr Phe 
65 70 75 80 

Gin Tyr Glu Glu Trp Ala Gly Pro Leu He Ser Met Ser Gly Glu His 

85 90 95 

He Thr Val Thr Gly Ala Ser Gly His Leu He Asn Cys Asp Gly Ala 

100 105 110 

Arg Trp Trp Asp Gly Lys Gly Thr Ser Gly Lys Lys Lys Pro Lys Phe 

115 120 125 

Phe Tyr Ala His Gly Leu Asp Ser Ser Ser He Thr Gly Leu Asn He 

130 135 140 

Lys Asn Thr Pro Leu Met Ala Phe Ser Val Gin Ala Asn Asp He Thr 
145 150 155 160 

Phe Thr Asp Val Thr He Asn Asn Ala Asp Gly Asp Thr Gin Gly Gly 

165 170 175 
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His Asn Thr Asp 

180 

lie Lys Pro Trp 
195 

Gly Glu Asn lie 
210 

Leu Ser lie Gly 
225 

Val Thr lie Glu 

lie Lys Thr lie 

260 

Ser Asn lie Val 
275 

Gin Asp Tyr Glu 
290 

Thr lie Gin Asp 
305 

Gly Ala Thr Glu 

Trp Thr Trp Asp 

340 

Cys Lys Asn Phe 
355 



Ala Phe Asp Val 

Val His Asn Gin 

200 

Trp Phe Thr Gly 
215 

Ser Val Gly Asp 
230 

His Ser Thr Val 
245 

Ser Gly Ala Thr 

Met Ser Gly lie 

280 

Asp Gly Lys Pro 
295 

Val Lys Leu Glu 
310 

lie Tyr Leu Leu 
325 

Asp Val Lys Val 

Pro Ser Val Ala 

360 



Gly Asn Ser Val 
185 

Asp Asp Cys Leu 

Gly Thr Cys lie 

220 

Arg Ser Asn Asn 
235 

Ser Asn Ser Glu 
250 

Gly Ser Val Ser 
265 

Ser Asp Tyr Gly 

Thr Gly Lys Pro 

300 

Ser Val Thr Gly 
315 

Cys Gly Ser Gly 
330 

Thr Gly Gly Lys 
345 

Ser Cys 



Gly Val Asn lie 
190 

Ala Val Asn Ser 
205 

Gly Gly His Gly 

Val Val Lys Asn 

240 

Asn Ala Val Arg 
255 

Glu lie Thr Tyr 
270 

Val Val lie Gin 
285 

Thr Asn Gly Val 

Ser Val Asp Ser 

320 

Ser Cys Ser Asp 
335 

Lys Ser Thr Ala 
350 



<210> 38 
<211> 383 
<212> PRT 

<213> Pseudomonas cellulosa 
<400> 38 

Arg Ala Asp Val Lys Pro Val Thr 
1 5 
Thr Met Glu Thr Arg Ser Leu Phe 

20 

His Ser He Met Phe Gly His Gin 

35 40 
He Thr Arg Thr Asp Gly Thr Gin 

50 55 
Asp Phe Ala Ala Val Tyr Gly Trp 
65 70 



Val Lys Leu Val Asp Ser Gin Ala 

10 15 
Ala Phe Met Gin Glu Gin Arg Arg 
25 30 
His Glu Thr Thr Gin Gly Leu Thr 

45 

Ser Asp Thr Phe Asn Ala Val Gly 

60 

Asp Thr Leu Ser He Val Ala Pro 
75 80 
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Lys Ala Glu Gly 

Gly Gly lie He 

100 

Thr Gin Lys Gly 
115 

Ala Val Val Asp 
130 

Gly Tyr Leu Asp 
145 

Gin Gly Arg Leu 

Gly Ser Trp Phe 

180 

Lys Gin Leu Phe 
195 

Val Arg Asn Phe 
210 

Thr Glu Ala Asn 
225 

Val Leu Gly Phe 

Phe Arg Asn Val 

260 

Ala Arg Gly Lys 
275 

Asp He Glu Ala 
290 

Ser Gly Leu Lys 
305 

Val Trp Arg Asn 

Val Pro His Tyr 

340 

Gly Thr Leu Glu 
355 

Phe Asn Arg Asp 
370 



Asp He Val Ala 
85 

Thr Val Ser Ser 

Val Trp Pro Val 

120 

Ser Leu Pro Gly 
135 

Gin Val Ala Glu 
150 

He Pro Val He 
165 

Trp Trp Gly Asp 

Arg Tyr Ser Val 

200 

Leu Tyr Ala Tyr 
215 

Tyr Leu Glu Arg 
230 

Asp Thr Tyr Gly 
245 

Val Ala Asn Ala 

He Pro Val He 

280 

Gly Leu Tyr Asp 
295 

Ala Asp Pro Asp 
310 

Ala Pro Gin Gly 
325 

Trp Val Pro Ala 

Asp Phe Gin Ala 

360 

He Glu Gin Val 
375 



41 

Gin Val Lys Lys 
90 

His Phe Asp Asn 
105 

Gly Thr Ser Trp 

Gly Ala Tyr Asn 

140 

Trp Ala Asn Asn 
155 

Phe Arg Leu Tyr 
170 

Lys Gin Ser Thr 
185 

Glu Tyr Leu Arg 

Ser Pro Asn Asn 

220 

Tyr Pro Gly Asp 
235 

Pro Val Ala Asp 
250 

Ala Leu Val Ala 
265 

Ser Glu He Gly 

Asn Gin Trp Tyr 

300 

Ala Arg Glu He 
315 

Val Pro Gly Pro 
330 

Asn Arg Pro Glu 
345 

Phe Tyr Ala Asp 

Tyr Gin Arg Pro 

380 



Ala Tyr Ala Arg 
95 

Pro Lys Thr Asp 
110 

Asp Gin Thr Pro 
125 

Pro Val Leu Asn 

Leu Lys Asp Glu 

160 

His Ala Asn Thr 
175 

Pro Glu Gin Tyr 
190 

Asp Val Lys Gly 
205 

Phe Trp Asp Val 

Glu Trp Val Asp 

240 

Asn Ala Asp Trp 
255 

Arg Met Ala Glu 
270 

He Arg Ala Pro 
285 

Arg Lys Leu He 

Ala Phe Leu Leu 

320 

Asn Gly Thr Gin 
335 

Asn He Asn Asn 
350 

Glu Phe Thr Ala 
365 

Thr Leu He 



<210> 39 
<211> 419 
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<212> PRT 

<213> Bacillus circulans 



<400> 39 

Leu Gin Pro Ala 

1 

Tyr Tyr Pro Ser 

20 

lie Asp Pro Thr 
35 

Cys Trp Asn Gly 
50 

Val Thr Trp Thr 
65 

Asn Gly Thr lie 

Phe Ala Gly Asp 

100 

Leu Asn Lys Leu 
115 

Val Gly Gly Trp 
130 

Ala Ala Thr Arg 
145 

Lys Tyr Asn Phe 

Gly Gly Leu Asp 

180 

Thr Leu Leu Leu 
195 

Val Asp Gly Lys 
210 

Thr Tyr Ala Ala 
225 

Trp lie Asn lie 

Ser Ala His Asn 

260 

Gly Val Pro Asp 
275 

His Leu Asp Ala 
290 



Thr Ala Glu Ala 
5 

Trp Ala Ala Tyr 

Lys Val Thr His 

40 

He His Gly Asn 
55 

Cys Gin Asn Glu 
70 

Val Leu Gly Asp 
85 

Thr Trp Asp Gin 

Lys Gin Thr Asn 

120 

Thr Trp Ser Asn 
135 

Glu Val Phe Ala 
150 

Asp Gly Val Asp 
165 

Gly Asn Ser Lys 

Ser Lys He Arg 

200 

Lys Tyr Leu Leu 
215 

Asn Thr Glu Leu 
230 

Met Thr Tyr Asp 
245 

Ala Pro Leu Asn 

Ala Asn Thr Phe 

280 

Gly Val Pro Ala 
295 



Ala Asp Ser Tyr 
10 

Gly Arg Asn Tyr 
25 

He Asn Tyr Ala 

Pro Asp Pro Ser 

60 

Lys Ser Gin Thr 
75 

Pro Trp He Asp 
90 

Pro He Ala Gly 
105 

Pro Asn Leu Lys 

Arg Phe Ser Asp 

140 

Asn Ser Ala Val 
155 

Leu Asp Trp Glu 
170 

Arg Pro Glu Asp 
185 

Glu Lys Leu Asp 

Thr He Ala Ser 

220 

Ala Lys He Ala 
235 

Phe Asn Gly Ala 
250 

Tyr Asp Pro Ala 
265 

Asn Val Ala Ala 

Ala Lys Leu Val 

300 



Lys He Val Gly 
15 

Asn Val Ala Asp 
30 

Phe Ala Asp He 
45 

Gly Pro Asn Pro 

He Asn Val Pro 

80 

Thr Gly Lys Thr 
95 

Asn He Asn Gin 
110 

Thr He He Ser 
125 

Val Ala Ala Thr 

Asp Phe Leu Arg 

160 

Tyr Pro Val Ser 
175 

Lys Gin Asn Tyr 
190 

Ala Ala Gly Ala 
205 

Gly Ala Ser Ala 

Ala He Val Asp 

240 

Trp Gin Lys He 
255 

Ala Ser Ala Ala 
270 

Gly Ala Gin Gly 
285 

Leu Gly Val Pro 
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Phe Tyr Gly Arg 
305 

Tyr Gin Thr Cys 

Ser Phe Asp Phe 

340 

Tyr Thr Arg Tyr 
355 

Ala Ser Asn Lys 
370 

Tyr Lys Thr Ala 
385 

Trp Glu Leu Ser 
Ala Asp Leu 



Gly Trp Asp Gly 
310 

Thr Gly Gly Ser 
325 

Tyr Asp Leu Glu 

Trp Asn Asp Thr 

360 

Arg Phe lie Ser 
375 

Tyr He Lys Ser 
390 

Gly Asp Arg Asn 
405 



Cys Ala Gin Ala 
315 

Ser Val Gly Thr 
330 

Ala Asn Tyr He 
345 

Ala Lys Val Pro 

Tyr Asp Asp Ala 

380 

Lys Gly Leu Gly 
395 

Lys Thr Leu Gin 
410 



Gly Asn Gly Gin 

320 

Trp Glu Ala Gly 
335 

Asn Lys Asn Gly 
350 

Tyr Leu Tyr Asn 
365 

Glu Ser Val Gly 

Gly Ala Met Phe 

400 

Asn Lys Leu Lys 
415 



<210> 40 
<211> 317 
<212> PRT 

<213> Candida antarctica 
<400> 40 

Leu Pro Ser Gly Ser Asp Pro Ala Phe Ser Gin Pro Lys Ser Val Leu 

15 10 15 

Asp Ala Gly Leu Thr Cys Gin Gly Ala Ser Pro Ser Ser Val Ser Lys 

20 25 30 

Pro He Leu Leu Val Pro Gly Thr Gly Thr Thr Gly Pro Gin Ser Phe 

35 40 45 

Asp Ser Asn Trp He Pro Leu Ser Thr Gin Leu Gly Tyr Thr Pro Cys 

50 55 60 

Trp He Ser Pro Pro Pro Phe Met Leu Asn Asp Thr Gin Val Asn Thr 
65 70 75 80 

Glu Tyr Met Val Asn Ala lie Thr Ala Leu Tyr Ala Gly Ser Gly Asn 

85 90 95 

Asn Lys Leu Pro Val Leu Thr Trp Ser Gin Gly Gly Leu Val Ala Gin 

100 105 110 

Trp Gly Leu Thr Phe Phe Pro Ser He Arg Ser Lys Val Asp Arg Leu 

115 120 125 

Met Ala Phe Ala Pro Asp Tyr Lys Gly Thr Val Leu Ala Gly Pro Leu 
130 135 140 
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Asp Ala Leu Ala Val Ser Ala Pro Ser Val Trp Gin Gin Thr Thr Gly 
145 150 155 160 

Ser Ala Leu Thr Thr Ala Leu Arg Asn Ala Gly Gly Leu Thr Gin lie 

165 170 175 

Val Pro Thr Thr Asn Leu Tyr Ser Ala Thr Asp Glu lie Val Gin Pro 

180 185 190 

Gin Val Ser Asn Ser Pro Leu Asp Ser Ser Tyr Leu Phe Asn Gly Lys 

195 200 205 

Asn Val Gin Ala Gin Ala Val Cys Gly Pro Leu Phe Val lie Asp His 

210 215 220 

Ala Gly Ser Leu Thr Ser Gin Phe Ser Tyr Val Val Gly Arg Ser Ala 
225 230 235 240 

Leu Arg Ser Thr Thr Gly Gin Ala Arg Ser Ala Asp Tyr Gly lie Thr 

245 250 255 

Asp Cys Asn Pro Leu Pro Ala Asn Asp Leu Thr Pro Glu Gin Lys Val 

260 265 270 

Ala Ala Ala Ala Leu Leu Ala Pro Ala Ala Ala Ala lie Val Ala Gly 

275 280 285 

Pro Lys Gin Asn Cys Glu Pro Asp Leu Met Pro Tyr Ala Arg Pro Phe 

290 295 300 

Ala Val Gly Lys Arg Thr Cys Ser Gly lie Val Thr Pro 
305 310 315 



<210> 41 

<211> 434 

<212> PRT 

<213> artificial sequence 
<220> 

<223> chimera of guinea pig and homo sapiens (human= approx . last 30 am 
ino acids) 

<400> 41 

Ala Glu Val Cys Tyr Ser His Leu Gly Cys Phe Ser Asp Glu Lys Pro 

15 10 15 

Trp Ala Gly Thr Ser Gin Arg Pro lie Lys Ser Leu Pro Ser Asp Pro 

20 25 30 

Lys Lys lie Asn Thr Arg Phe Leu Leu Tyr Thr Asn Glu Asn Gin Asn 

35 40 45 

Ser Tyr Gin Leu lie Thr Ala Thr Asp lie Ala Thr lie Lys Ala Ser 

50 55 60 

Asn Phe Asn Leu Asn Arg Lys Thr Arg Phe lie lie His Gly Phe Thr 
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65 70 75 80 

Asp Ser Gly Glu Asn Ser Trp Leu Ser Asp Met Cys Lys Asn Met Phe 

85 90 95 

Gin Val Glu Lys Val Asn Cys lie Cys Val Asp Trp Lys Gly Gly Ser 

100 105 110 

Lys Ala Gin Tyr Ser Gin Ala Ser Gin Asn lie Arg Val Val Gly Ala 

115 120 125 

Glu Val Ala Tyr Leu Val Gin Val Leu Ser Thr Ser Leu Asn Tyr Ala 

130 135 140 

Pro Glu Asn Val His lie He Gly His Ser Leu Gly Ala His Thr Ala 
145 150 155 160 

Gly Glu Ala Gly Lys Arg Leu Asn Gly Leu Val Gly Arg He Thr Gly 

165 170 175 

Leu Asp Pro Ala Glu Pro Tyr Phe Gin Asp Thr Pro Glu Glu Val Arg 

180 185 190 

Leu Asp Pro Ser Asp Ala Lys Phe Val Asp Val He His Thr Asp He 

195 200 205 

Ser Pro He Leu Pro Ser Leu Gly Phe Gly Met Ser Gin Lys Val Gly 

210 215 220 

His Met Asp Phe Phe Pro Asn Gly Gly Lys Asp Met Pro Gly Cys Lys 
225 230 235 240 

Thr Gly He Ser Cys Asn His His Arg Ser He Glu Tyr Tyr His Ser 

245 250 255 

Ser He Leu Asn Pro Glu Gly Phe Leu Gly Tyr Pro Cys Ala Ser Tyr 

260 265 270 

Asp Glu Phe Gin Glu Ser Gly Cys Phe Pro Cys Pro Ala Lys Gly Cys 

275 280 285 

Pro Lys Met Gly His Phe Ala Asp Gin Tyr Pro Gly Lys Thr Asn Ala 

290 295 300 

Val Glu Gin Thr Phe Phe Leu Asn Thr Gly Ala Ser Asp Asn Phe Thr 
305 310 315 320 

Arg Trp Arg Tyr Lys Val Thr Val Thr Leu Ser Gly Glu Lys Asp Pro 

325 330 335 

Ser Gly Asn He Asn Val Ala Leu Leu Gly Lys Asn Gly Asn Ser Ala 

340 345 350 

Gin Tyr Gin Val Phe Lys Gly Thr Leu Lys Pro Asp Ala Ser Tyr Thr 

355 360 365 

Asn Ser He Asp Val Glu Leu Asn Val Gly Thr He Gin Lys Val Thr 

370 375 380 

Phe Leu Trp Lys Arg Ser Gly He Ser Val Ser Lys Pro Lys Met Gly 
385 390 395 400 

Ala Ser Arg He Thr Val Gin Ser Gly Lys Asp Gly Thr Lys Tyr Asn 
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405 410 415 

Phe Cys Ser aer Asp lie Val Gin Glu Asn Val Glu Gin Thr Leu Ser 

420 425 430 

Pro Cys 



<210> 42 

<211> 471 

<212> PRT 

<213> Escherichia coli 



<400> 42 

Met Lys Gin Ser 

1 

Pro Val Thr Lys 

20 

Ala Ala Gin Gly 

35 

Gly Asp Gin Thr 
50 

Lys Asn lie lie 
65 

Thr Ala Ala Arg 

lie Asp Ala Leu 

100 

Lys Lys Thr Gly 
115 

Thr Ala Trp Ser 
130 

Asp He His Glu 
145 

Ala Gly Leu Ala 

Thr Pro Ala Ala 

180 

Pro Ser Ala Thr 
195 

Gly Lys Gly Ser 
210 

Thr Leu Gly Gly 



Thr lie Ala Leu 

5 

Ala Arg Thr Pro 

Asp He Thr Ala 

40 

Ala Ala Leu Arg 

55 

Leu Leu lie Gly 
70 

Asn Tyr Ala Glu 
85 

Pro Leu Thr Gly 

Lys Pro Asp Tyr 

120 

Thr Gly Val Lys 
135 

Lys Asp His Pro 
150 

Thr Gly Asn Val 
165 

Leu Val Ala His 

Ser Glu Lys Cys 

200 

He Thr Glu Gin 
215 

Gly Ala Lys Thr 



Ala Leu Leu Pro 
10 

Glu Met Pro Val 
25 

Pro Gly Gly Ala 

Asp Ser Leu Ser 

60 

Asp Gly Met Gly 
75 

Gly Ala Gly Gly 
90 

Gin Tyr Thr His 
105 

Val Thr Asp Ser 

Thr Tyr Asn Gly 

140 

Thr He Leu Glu 
155 

Ser Thr Ala Glu 
170 

Val Thr Ser Arg 
185 

Pro Gly Asn Ala 

Leu Leu Asn Ala 

220 

Phe Ala Glu Thr 



Leu Leu Phe Thr 
15 

Leu Glu Asn Arg 
30 

Arg Arg Leu Thr 
45 

Asp Lys Pro Ala 

Asp Ser Glu He 

80 

Phe Phe Lys Gly 
95 

Tyr Ala Leu Asn 
110 

Ala Ala Ser Ala 
125 

Ala Leu Gly Val 

Met Ala Lys Ala 

160 

Leu Gin Asp Ala 
175 

Lys Cys Tyr Gly 
190 

Leu Glu Lys Gly 
205 

Arg Ala Asp Val 
Ala Thr Ala Gly 
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225 

Glu Trp Gin Gly 

Gin Leu Val Ser 

260 

Gin Gin Lys Pro 
275 

Arg Trp Leu Gly 
290 

Ala Val Thr Cys 
305 

Leu Ala Gin Met 

Lys Gly Phe Phe 

340 

His Ala Ala Asn 
355 

Glu Ala Val Gin 
370 

Leu Val He Val 
385 

Pro Asp Thr Lys 

Gly Ala Val Met 

420 

Glu His Thr Gly 
435 

Ala Asn Val Val 
450 

Lys Ala Ala Leu 
465 



230 

Lys Thr Leu Arg 
245 

Asp Ala Ala Ser 

Leu Leu Gly Leu 

280 

Pro Lys Ala Thr 
295 

Thr Pro Asn Pro 
310 

Thr Asp Lys Ala 
325 

Leu Gin Val Glu 

Pro Cys Gly Gin 

360 

Arg Ala Leu Glu 
375 

Thr Ala Asp His 
390 

Ala Pro Gly Leu 
405 

Val Met Ser Tyr 

Ser Gin Leu Arg 

440 

Gly Leu Thr Asp 
455 

Gly Leu Lys 
470 



47 

235 

Glu Gin Ala Gin 
250 

Leu Asn Ser Val 
265 

Phe Ala Asp Gly 

Tyr His Gly Asn 

300 

Gin Arg Asn Asp 
315 

He Glu Leu Leu 
330 

Gly Ala Ser He 
345 

He Gly Glu Thr 

Phe Ala Lys Lys 

380 

Ala His Ala Ser 
395 

Thr Gin Ala Leu 
410 

Gly Asn Ser Glu 
425 

He Ala Ala Tyr 

Gin Thr Asp Leu 

460 



240 

Ala Arg Gly Tyr 

255 

Thr Glu Ala Asn 
270 

Asn Met Pro Val 
285 

He Asp Lys Pro 

Ser Val Pro Thr 

320 

Ser Lys Asn Glu 
335 

Asp Lys Gin Asp 
350 

Val Asp Leu Asp 
365 

Glu Gly Asn Thr 

Gin He Val Ala 

400 

Asn Thr Lys Asp 
415 

Glu Asp Ser Gin 
430 

Gly Pro His Ala 
445 

Phe Tyr Thr Met 



<210> 43 

<211> 260 

<212> PRT 

<213> Bovine 

<400> 43 

Leu Lys He Ala Ala Phe Asn He Arg Thr Phe Gly Glu Thr Lys Met 

15 10 15 

Ser Asn Ala Thr Leu Ala Ser Tyr He Val Arg He Val Arg Arg Tyr 
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20 

Asp lie Val Leu 
35 

Gly Lys Leu Leu 
50 

Tyr Val Val Ser 
65 

Leu Phe Leu Phe 

Tyr Asp Asp Gly 

100 

Pro Ala Val Val 

115 

Ala He Val Ala 
130 

Asn Ser Leu Tyr 
145 

Asn Asp Val Met 

Thr Ser Ser Gin 

180 

Gin Trp Leu lie 
195 

Cys Ala Tyr Asp 
210 

Val Val Pro Gly 
225 

Leu Ser Asn Glu 

Val Thr Leu Thr 

260 



He Gin Glu Val 

40 

Asp Tyr Leu Asn 
55 

Glu Pro Leu Gly 
70 

Arg Pro Asn Lys 
85 

Cys Glu Ser Cys 

Lys Phe Ser Ser 

120 

Leu His Ser Ala 
135 

Asp Val Tyr Leu 
150 

Leu Met Gly Asp 
165 

Trp Ser Ser He 

Pro Asp Ser Ala 

200 

Arg He Val Val 
215 

Ser Ala Ala Pro 
230 

Met Ala Leu Ala 
245 



48 

25 

Arg Asp Ser His 

Gin Asp Asp Pro 

60 

Arg Asn Ser Tyr 
75 

Val Ser Val Leu 
90 

Gly Asn Asp Ser 
105 

His Ser Thr Lys 

Pro Ser Asp Ala 

140 

Asp Val Gin Gin 
155 

Phe Asn Ala Asp 
170 

Arg Leu Arg Thr 
185 

Asp Thr Thr Ala 

Ala Gly Ser Leu 

220 

Phe Asp Phe Gin 
235 

He Ser Asp His 
250 



30 

Leu Val Ala Val 
45 

Asn Thr Tyr His 

Lys Glu Arg Tyr 

80 

Asp Thr Tyr Gin 

95 

Phe Ser Arg Glu 
110 

Val Lys Glu Phe 
125 

Val Ala Glu He 

Lys Trp His Leu 

160 

Cys Ser Tyr Val 
175 

Ser Ser Thr Phe 
190 

Thr Ser Thr Asn 
205 

Leu Gin Ser Ser 

Ala Ala Tyr Gly 

240 

Tyr Pro Val Glu 
255 



<210> 44 
<2H> 686 
<212> PRT 

<213> Bacillus circulans 
<400> 44 

Ala Pro Asp Thr Ser Val Ser Asn Lys Gin Asn Phe Ser Thr Asp Val 

15 10 15 

He Tyr Gin He Phe Thr Asp Arg Phe Ser Asp Gly Asn Pro Ala Asn 
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20 25 30 

Asn Pro Thr Gly Ala Ala Phe Asp Gly Thr Cys Thr Asn Leu Arg Leu 

35 40 45 

Tyr Cys Gly Gly Asp Trp Gin Gly lie lie Asn Lys lie Asn Asp Gly 

50 55 60 

Tyr Leu Thr Gly Met Gly Val Thr Ala lie Trp lie Ser Gin Pro Val 
65 70 75 80 

Glu Asn lie Tyr Ser lie lie Asn Tyr Ser Gly Val Asn Asn Thr Ala 

85 90 95 

Tyr His Gly Tyr Trp Ala Arg Asp Phe Lys Lys Thr Asn Pro Ala Tyr 

100 105 110 

Gly Thr lie Ala Asp Phe Gin Asn Leu lie Ala Ala Ala His Ala Lys 

115 120 125 

Asn lie Lys Val lie lie Asp Phe Ala Pro Asn His Thr Ser Pro Ala 

130 135 140 

Ser Ser Asp Gin Pro Ser Phe Ala Glu Asn Gly Arg Leu Tyr Asp Asn 
145 150 155 160 

Gly Thr Leu Leu Gly Gly Tyr Thr Asn Asp Thr Gin Asn Leu Phe His 

165 170 175 

His Asn Gly Gly Thr Asp Phe Ser Thr Thr Glu Asn Gly lie Tyr Lys 

180 185 190 

Asn Leu Tyr Asp Leu Ala Asp Leu Asn His Asn Asn Ser Thr Val Asp 

195 200 205 

Val Tyr Leu Lys Asp Ala He Lys Met Trp Leu Asp Leu Gly He Asp 

210 215 220 

Gly He Arg Met Asp Ala Val Lys His Met Pro Phe Gly Trp Gin Lys 
225 230 235 240 

Ser Phe Met Ala Ala Val Asn Asn Tyr Lys Pro Val Phe Thr Phe Gly 

245 250 255 

Glu Trp Phe Leu Gly Val Asn Glu Val Ser Pro Glu Asn His Lys Phe 

260 265 270 

Ala Asn Glu Ser Gly Met Ser Leu Leu Asp Phe Arg Phe Ala Gin Lys 

275 280 285 

Val Arg Gin Val Phe Arg Asp Asn Thr Asp Asn Met Tyr Gly Leu Lys 

290 295 300 

Ala Met Leu Glu Gly Ser Ala Ala Asp Tyr Ala Gin Val Asp Asp Gin 
305 310 315 320 

Val Thr Phe He Asp Asn His Asp Met Glu Arg Phe His Ala Ser Asn 

325 330 335 

Ala Asn Arg Arg Lys Leu Glu Gin Ala Leu Ala Phe Thr Leu Thr Ser 

340 345 350 

Arg Gly Val Pro Ala He Tyr Tyr Gly Thr Glu Gin Tyr Met Ser Gly 
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355 360 365 

Gly Thr Asp Pro Asp Asn Arg Ala Arg lie Pro Ser Phe Ser Thr Ser 

370 375 380 

Thr Thr Ala Tyr Gin Val lie Gin Lys Leu Ala Pro Leu Arg Lys Cys 
385 390 395 400 

Asn Pro Ala lie Ala Tyr Gly Ser Thr Gin Glu Arg Trp He Asn Asn 

405 410 415 

Asp Val Leu He Tyr Glu Arg Lys Phe Gly Ser Asn Val Ala Val Val 

420 425 430 

Ala Val Asn Arg Asn Leu Asn Ala Pro Ala Ser He Ser Gly Leu Val 

435 440 445 

Thr Ser Leu Pro Gin Gly Ser Tyr Asn Asp Val Leu Gly Gly Leu Leu 

450 455 460 

Asn Gly Asn Thr Leu Ser Val Gly Ser Gly Gly Ala Ala Ser Asn Phe 
465 470 475 480 

Thr Leu Ala Ala Gly Gly Thr Ala Val Trp Gin Tyr Thr Ala Ala Thr 

485 490 495 

Ala Thr Pro Thr He Gly His Val Gly Pro Met Met Ala Lys Pro Gly 

500 505 510 

Val Thr He Thr He Asp Gly Arg Gly Phe Gly Ser Ser Lys Gly Thr 

515 520 525 

Val Tyr Phe Gly Thr Thr Ala Val Ser Gly Ala Asp He Thr Ser Trp 

530 535 540 

Glu Asp Thr Gin He Lys Val Lys He Pro Ala Val Ala Gly Gly Asn 
545 550 555 560 

Tyr Asn He Lys Val Ala Asn Ala Ala Gly Thr Ala Ser Asn Val Tyr 

565 570 575 

Asp Asn Phe Glu Val Leu Ser Gly Asp Gin Val Ser Val Arg Phe Val 

580 585 590 

Val Asn Asn Ala Thr Thr Ala Leu Gly Gin Asn Val Tyr Leu Thr Gly 

595 600 605 

Ser Val Ser Glu Leu Gly Asn Trp Asp Pro Ala Lys Ala He Gly Pro 

610 615 620 

Met Tyr Asn Gin Val Val Tyr Gin Tyr Pro Asn Trp Tyr Tyr Asp Val 
625 630 635 640 

Ser Val Pro Ala Gly Lys Thr He Glu Phe Lys Phe Leu Lys Lys Gin 

645 650 655 

Gly Ser Thr Val Thr Trp Glu Gly Gly Ser Asn His Thr Phe Thr Ala 

660 665 670 

Pro Ser Ser Gly Thr Ala Thr He Asn Val Asn Trp Gin Pro 
675 680 685 
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<210> 45 

<211> 404 

<212> PRT 

<213> Amycolatopsis orientalis 



<400> 45 

Met Arg Val Leu 

1 

Leu Val Ala Leu 

20 

Met Cys Leu Pro 
35 

Pro Met Val Pro 
50 

Gly Glu Leu Pro 
65 

Glu Trp Phe Asp 

Val Thr Thr Gly 

100 

Glu Lys Leu Gly 
115 

Leu Pro Ser Glu 
130 

Ala Asp Arg Leu 
145 

Gly Leu Pro Pro 

Pro Trp Leu Ala 

180 

Leu Gly Thr Val 
195 

Leu Ser Ala Glu 
210 

Tyr Val Gly Phe 

225 

Lys Met Ala lie 

Ser Arg Gly Trp 

260 

Phe Val Val Gly 



He Thr Gly Cys 
5 

Ala Ala Arg Leu 

Pro Asp Tyr Val 

40 

Val Gly Arg Ala 
55 

Pro Gly Ala Ala 
70 

Lys Val Pro Ala 
85 

Leu Leu Pro Ala 

He Pro Tyr Arg 

120 

Gin Ser Gin Ala 
135 

Phe Gly Asp Ala 
150 

Val Glu His Leu 
165 

Ala Asp Pro Val 

Gin Thr Gly Ala 

200 

Leu Glu Ala Phe 
215 

Gly Ser Ser Ser 
230 

Lys Ala Val Arg 
245 

Ala Asp Leu Val 
Glu Val Asn Leu 



Gly Ser Arg Gly 
10 

Arg Glu Leu Gly 
25 

Glu Arg Cys Ala 

Val Arg Ala Gly 

60 

Glu Val Val Thr 
75 

Ala He Glu Gly 
90 

Ala Val Ala Val 
105 

Tyr Thr Val Leu 

Glu Arg Asp Met 

140 

Val Asn Ser His 
155 

Tyr Asp Tyr Gly 
170 

Leu Ser Pro Leu 
185 

Trp He Leu Pro 

Leu Ala Ala Gly 

220 

Arg Pro Ala Thr 
235 

Ala Ser Gly Arg 
250 

Leu Pro Asp Asp 
265 

Gin Glu Leu Phe 



Asp Thr Glu Pro 
15 

Ala Asp Ala Arg 
30 

Glu Val Gly Val 
45 

Ala Arg Glu Pro 

Glu Val Val Ala 

80 

Cys Asp Ala Val 
95 

Arg Ser Met Ala 
110 

Ser Pro Asp Hi3 
125 

Tyr Asn Gin Gly 

Arg Ala Ser He 

160 

Tyr Thr Asp Gin 
175 

Arg Pro Thr Asp 
190 

Asp Glu Arg Pro 
205 

Ser Thr Pro Val 

Ala Asp Ala Ala 

240 

Arg He Val Leu 
255 

Gly Ala Asp Cys 
270 

Gly Arg Val Ala 
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Ala Ala He His 
290 

Ala Gly He Pro 
305 

Glu Gin Ala Tyr 

Ala Val Asp Gly 

340 

Asp Thr Ala Leu 
355 

Asp Thr He Arg 
370 

Asp Ala Val Ser 
385 

His His His His 



280 

His Asp Ser Ala 

295 

Gin He Val Val 
310 

His Ala Asp Arg 
325 

Pro Val Pro Thr 

Ala Pro Glu He 

360 

Ala Asp Gly Thr 
375 

Leu Glu Lys Pro 
390 



52 

Gly Thr Thr Leu 

300 

Arg Arg Val Val 
315 

Val Ala Glu Leu 
330 

He Asp Ser Leu 
345 

Arg Ala Arg Ala 

Thr Val Ala Ala 

380 

Thr Val Pro Ala 
395 



285 

Leu Ala Met Arg 

Asp Asn Val Val 

320 

Gly Val Gly Val 

335 

Ser Ala Ala Leu 
350 

Thr Thr Val Ala 
365 

Gin Leu Leu Phe 

Leu Glu His His 

400 



<210> 46 
<211> 292 
<212> PRT 

<213> Pseudomonas sp. 
<400> 46 

Ser He Glu Arg Leu Gly Tyr Leu Gly Phe Ala Val Lys Asp Val Pro 

15 10 15 

Ala Trp Asp His Phe Leu Thr Lys Ser Val Gly Leu Met Ala Ala Gly 

20 25 30 

Ser Ala Gly Asp Ala Ala Leu Tyr Arg Ala Asp Gin Arg Ala Trp Arg 

35 40 45 

He Ala Val Gin Pro Gly Glu Leu Asp Asp Leu Ala Tyr Ala Gly Leu 
50 55 60 

Glu Val Asp Asp Ala Ala Ala Leu Glu Arg Met Ala Asp Lys Leu Arg 
65 70 75 80 

Gin Ala Gly Val Ala Phe Thr Arg Gly Asp Glu Ala Leu Met Gin Gin 

85 90 95 

Arg Lys Val Met Gly Leu Leu Cys Leu Gin Asp Pro Phe Gly Leu Pro 

100 105 110 

Leu Glu He Tyr Tyr Gly Pro Ala Glu He Phe His Glu Pro Phe Leu 

115 120 125 

Pro Ser Ala Pro Val Ser Gly Phe Val Thr Gly Asp Gin Gly He Gly 
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130 

His Phe Val Arg 
145 

Thr Glu Val Leu 

Gly Pro Glu Thr 

180 

His His Thr lie 
195 

His Phe Met Leu 
210 

Asp Arg Leu Asp 

225 

Thr Asn Asp Gin 

He Glu Val Glu 

260 

Thr Val Ala Arg 
275 

Arg Gly Gin Arg 
290 



135 

Cys Val Pro Asp 
150 

Gly Phe Val Leu 
165 

Ser Val Pro Ala 

Ala Leu Ala Ala 

200 

Gin Ala Asn Thr 
215 

Ala Ala Gly Arg 
230 

Thr Leu Ser Phe 
245 

Phe Gly Trp Gly 

His Ser Arg Thr 

280 



53 

140 

Thr Ala Lys Ala 
155 

Ser Asp He He 
170 

His Phe Leu His 
185 

Phe Pro He Pro 

He Asp Asp Val 

220 

He Thr Ser Leu 
235 

Tyr Ala Asp Thr 

250 

Pro Arg Thr Val 
265 

Ala Met Trp Gly 



Met Ala Phe Tyr 

160 

Asp He Gin Met 
175 

Cys Asn Gly Arg 
190 

Lys Arg He His 
205 

Gly Tyr Ala Phe 

Leu Gly Arg His 

240 

Pro Ser Pro Met 
255 

Asp Ser Ser Trp 
270 

His Lys Ser Val 
285 



<210> 47 
<211> 311 
<212> PRT 

<213> Acitenobacter sp. 
<400> 47 

Met Glu Val Lys He Phe Asn Thr Gin Asp Val Gin Asp Phe Leu Arg 

15 10 15 

Val Ala Ser Gly Leu Glu Gin Glu Gly Gly Asn Pro Arg Val Lys Gin 

20 25 30 

He He His Arg Val Leu Ser Asp Leu Tyr Lys Ala He Glu Asp Leu 

35 40 45 

Asn He Thr Ser Asp Glu Tyr Trp Ala Gly Val Ala Tyr Leu Asn Gin 

50 55 60 

Leu Gly Ala Asn Gin Glu Ala Gly Leu Leu Ser Pro Gly Leu Gly Phe 
65 70 75 80 

Asp His Tyr Leu Asp Met Arg Met Asp Ala Glu Asp Ala Ala Leu Gly 

85 90 95 

He Glu Asn Ala Thr Pro Arg Thr lie Glu Gly Pro Leu Tyr Val Ala 
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100 105 110 

Gly Ala Pro Glu Ser Val Gly Tyr Ala Arg Met Asp Asp Gly Ser Asp 

115 120 125 

Pro Asn Gly His Thr Leu lie Leu His Gly Thr He Phe Asp Ala Asp 

130 135 140 

Gly Lys Pro Leu Pro Asn Ala Lys Val Glu He Trp His Ala Asn Thr 
145 150 155 160 

Lys Gly Phe Tyr Ser His Phe Asp Pro Thr Gly Glu Gin Gin Ala Phe 

165 170 175 

Asn Met Arg Arg Ser He He Thr Asp Glu Asn Gly Gin Tyr Arg Val 

180 185 190 

Arg Thr He Leu Pro Ala Gly Tyr Gly Cys Pro Pro Glu Gly Pro Thr 

195 200 205 

Gin Gin Leu Leu Asn Gin Leu Gly Arg His Gly Asn Arg Pro Ala His 

210 215 220 

He His Tyr Phe Val Ser Ala Asp Gly His Arg Lys Leu Thr Thr Gin 
225 230 235 240 

He Asn Val Ala Gly Asp Pro Tyr Thr Tyr Asp Asp Phe Ala Tyr Ala 

245 250 255 

Thr Arg Glu Gly Leu Val Val Asp Ala Val Glu His Thr Asp Pro Glu 

260 265 270 

Ala He Lys Ala Asn Asp Val Glu Gly Pro Phe Ala Glu Met Val Phe 

275 280 285 

Asp Leu Lys Leu Thr Arg Leu Val Asp Gly Val Asp Asn Gin Val Val 

290 295 300 

Asp Arg Pro Arg Leu Ala Val 
305 310 



<210> 48 
<211> 414 
<212> PRT 

<213> Pseudomonas putida 
<400> 48 

Thr Thr Glu Thr He Gin Ser Asn 
1 5 
His Val Pro Glu His Leu Val Phe 

20 

Asn Leu Ser Ala Gly Val Gin Glu 

35 40 
Asn Val Pro Asp Leu Val Trp Thr 



Ala Asn Leu Ala Pro Leu Pro Pro 

10 15 
Asp Phe Asp Met Tyr Asn Pro Ser 
25 30 
Ala Trp Ala Val Leu Gin Glu Ser 

45 

Arg Cys Asn Gly Gly His Trp He 
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50 55 60 

Ala Thr Arg Gly Gin Leu lie Arg Glu Ala Tyr Glu Asp Tyr Arg His 
65 70 75 80 

Phe Ser Ser Glu Cys Pro Phe lie Pro Arg Glu Ala Gly Glu Ala Tyr 

85 90 95 

Asp Phe lie Pro Thr Ser Met Asp Pro Pro Glu Gin Arg Gin Phe Arg 

100 105 110 

Ala Leu Ala Asn Gin Val Val Gly Met Pro Val Val Asp Lys Leu Glu 

115 120 125 

Asn Arg lie Gin Glu Leu Ala Cys Ser Leu lie Glu Ser Leu Arg Pro 

130 135 140 

Gin Gly Gin Cys Asn Phe Thr Glu Asp Tyr Ala Glu Pro Phe Pro lie 
145 150 155 160 

Arg He Phe Met Leu Leu Ala Gly Leu Pro Glu Glu Asp lie Pro His 

165 170 175 

Leu Lys Tyr Leu Thr Asp Gin Met Thr Arg Pro Asp Gly Ser Met Thr 

180 185 190 

Phe Ala Glu Ala Lys Glu Ala Leu Tyr Asp Tyr Leu lie Pro lie lie 

195 200 205 

Glu Gin Arg Arg Gin Lys Pro Gly Thr Asp Ala lie Ser lie Val Ala 

210 215 220 

Asn Gly Gin Val Asn Gly Arg Pro He Thr Ser Asp Glu Ala Lys Arg 
225 230 235 240 

Met Cys Gly Leu Leu Leu Val Gly Gly Leu Asp Thr Val Val Asn Phe 

245 250 255 

Leu Ser Phe Ser Met Glu Phe Leu Ala Lys Ser Pro Glu His Arg Gin 

260 265 270 

Glu Leu He Gin Arg Pro Glu Arg He Pro Ala Ala Cys Glu Glu Leu 

275 280 285 

Leu Arg Arg Phe Ser Leu Val Ala Asp Gly Arg He Leu Thr Ser Asp 

290 295 300 

Tyr Glu Phe His Gly Val Gin Leu Lys Lys Gly Asp Gin He Leu Leu 
305 310 315 320 

Pro Gin Met Leu Ser Gly Leu Asp Glu Arg Glu Asn Ala Cys Pro Met 

325 330 335 

His Val Asp Phe Ser Arg Gin Lys Val Ser His Thr Thr Phe Gly His 

340 345 350 

Gly Ser His Leu Cys Leu Gly Gin His Leu Ala Arg Arg Glu He He 

355 360 365 

Val Thr Leu Lys Glu Trp Leu Thr Arg He Pro Asp Phe Ser lie Ala 

370 375 380 

Pro Gly Ala Gin He Gin His Lys Ser Gly He Val Ser Gly Val Gin 
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385 390 395 

Ala Leu Pro Leu Val Trp Asp Pro Ala Thr Thr Lys Ala Val 

405 410 



400 



<210> 49 

<211> 374 

<212> PRT 

<213> Equus caballus 



<400> 49 

Ser Thr Ala Gly 

1 

Glu Lys Lys Pro 

20 

Ala His Glu Val 
35 

Asp Asp His Val 
50 

Ala Gly His Glu 
65 

Thr Thr Val Arg 

Cys Gly Lys Cys 

100 

Lys Asn Asp Leu 
115 

Arg Phe Thr Cys 
130 

Thr Phe Ser Gin 
145 

Asp Ala Ala Ser 

Ser Thr Gly Tyr 

180 

Ser Thr Cys Ala 
195 

Met Gly Cys Lys 
210 

Asn Lys Asp Lys 
225 

Val Asn Pro Gin 



Lys Val lie Lys 
5 

Phe Ser lie Glu 

Arg lie Lys Met 

40 

Val Ser Gly Thr 
55 

Ala Ala Gly lie 
70 

Pro Gly Asp Lys 
85 

Arg Val Cys Lys 

Ser Met Pro Arg 

120 

Arg Gly Lys Pro 
135 

Tyr Thr Val Val 
150 

Pro Leu Glu Lys 
165 

Gly Ser Ala Val 

r 

Val Phe Gly Leu 

200 

Ala Ala Gly Ala 
215 

Phe Ala Lys Ala 
230 

Asp Tyr Lys Lys 



Cys Lys Ala Ala 
10 

Glu Val Glu Val 
25 

Val Ala Thr Gly 

Leu Val Thr Pro 

60 

Val Glu Ser He 
75 

Val He Pro Leu 
90 

His Pro Glu Gly 
105 

Gly Thr Met Gin 

He His His Phe 

140 

Asp Glu He Ser 
155 

Val Cys Leu He 
170 

Lys Val Ala Lys 
185 

Gly Gly Val Gly 

Ala Arg He He 

220 

Lys Glu Val Gly 
235 

Pro He Gin Glu 



Val Leu Trp Glu 
15 

Ala Pro Pro Lys 
30 

lie Cys Arg Ser 
45 

Leu Pro Val He 

Gly Glu Gly Val 

80 

Phe Thr Pro Gin 
95 

Asn Phe Cys Leu 
110 

Asp Gly Thr Ser 
125 

Leu Gly Thr Ser 

Val Ala Lys He 

160 

Gly Cys Gly Phe 
175 

Val Thr Gin Gly 
190 

Leu Ser Val He 
205 

Gly Val Asp He 

Ala Thr Glu Cys 

240 

Val Leu Thr Glu 
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245 250 255 

Met Ser Asn Gly Gly Val Asp Phe Ser Phe Glu Val lie Gly Arg Leu 

260 265 270 

Asp Thr Met Val Thr Ala Leu Ser Cys Cys Gin Glu Ala Tyr Gly Val 

275 280 285 

Ser Val lie Val Gly Val Pro Pro Asp Ser Gin Asn Leu Ser Met Asn 

290 295 300 

Pro Met Leu Leu Leu Ser Gly Arg Thr Trp Lys Gly Ala lie Phe Gly 
305 310 315 320 

Gly Phe Lys Ser Lys Asp Ser Val Pro Lys Leu Val Ala Asp Phe Met 

325 330 335 

Ala Lys Lys Phe Ala Leu Asp Pro Leu lie Thr His Val Leu Pro Phe 

340 345 350 

Glu Lys lie Asn Glu Gly Phe Asp Leu Leu Arg Ser Gly Glu Ser lie 

355 360 365 

Arg Thr lie Leu Thr Phe 
370 



<210> 50 

<211> 297 

<212> PRT 

<213> Escherichia coli 



<400> 50 

Met Ala Thr Asn 

1 

Asp Gin Gin Gin 

20 

Phe Asn lie Gin 
35 

Gly Glu Ala Phe 
50 

He Val Ala Glu 
65 

Gly Cys val Thr 

Arg Tyr Gly Phe 

100 

Phe Ser Phe Glu 
115 

Ala Asp Gly Leu 



Leu Arg Gly Val 
5 

Ala Leu Asp Lys 

Gin Gly He Asp 

40 

Val Gin Ser Leu 
55 

Glu Gly Lys Gly 
70 

Thr Ala Glu Ser 
85 

Asp Ala Val Ser 

Glu His Cys Asp 

120 

Pro Met Val Val 



Met Ala Ala Leu 
10 

Ala Ser Leu Arg 
25 

Gly Leu Tyr Val 

Ser Glu Arg Glu 

60 

Lys He Lys Leu 
75 

Gin Gin Leu Ala 
90 

Ala Val Thr Pro 
105 

His Tyr Arg Ala 
Tyr Asn He Pro 



Leu Thr Pro Phe 
15 

Arg Leu Val Gin 
30 

Gly Gly Ser Thr 
45 

Gin Val Leu Glu 

He Ala His Val 

80 

Ala Ser Ala Lys 
95 

Phe Tyr Tyr Pro 
110 

He He Asp Ser 
125 

Ala Leu Ser Gly 
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130 135 140 

Val Lys Leu Thr Leu Asp Gin lie Asn Thr Leu Val Thr Leu Pro Gly 
145 150 155 160 

Val Gly Ala Leu Lys Gin Thr Ser Gly Asp Leu Tyr Gin Met Glu Gin 

165 170 175 

lie Arg Arg Glu His Pro Asp Leu Val Leu Tyr Asn Gly Tyr Asp Glu 

180 185 190 

lie Phe Ala Ser Gly Leu Leu Ala Gly Ala Asp Gly Gly lie Gly Ser 

195 200 205 

Thr Tyr Asn lie Met Gly Trp Arg Tyr Gin Gly lie Val Lys Ala Leu 

210 215 220 

Lys Glu Gly Asp He Gin Thr Ala Gin Lys Leu Gin Thr Glu Cys Asn 
225 230 235 240 

Lys Val He Asp Leu Leu He Lys Thr Gly Val Phe Arg Gly Leu Lys 

245 250 255 

Thr Val Leu His Tyr Met Asp Val Val Ser Val Pro Leu Cys Arg Lys 

260 265 270 

Pro Phe Gly Pro Val Asp Glu Lys Tyr Leu Pro Glu Leu Lys Ala Leu 

275 280 285 

Ala Gin Gin Leu Met Gin Glu Arg Gly 
290 295 



<210> 51 
<211> 268 
<212> PRT 

<213> Salmonella typhimurium 
<400> 51 

Met Glu Arg Tyr Glu Asn Leu Phe Ala Gin Leu Asn Asp Arg Arg Glu 

15 10 15 

Gly Ala Phe Val Pro Phe Val Thr Leu Gly Asp Pro Gly He Glu Gin 

20 25 30 

Ser Leu Lys He He Asp Thr Leu He Asp Ala Gly Ala Asp Ala Leu 

35 40 45 

Glu Leu Gly Val Pro Phe Ser Asp Pro Leu Ala Asp Gly Pro Thr He 

50 55 60 

Gin Asn Ala Asn Leu Arg Ala Phe Ala Ala Gly Val Thr Pro Ala Gin 
65 70 75 80 

Cys Phe Glu Met Leu Ala Leu He Arg Glu Lys His Pro Thr He Pro 

85 90 95 

He Gly Leu Leu Met Tyr Ala Asn Leu Val Phe Asn Asn Gly He Asp 
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100 

Ala Phe Tyr Ala 
115 

Ala Asp Val Pro 
130 

Arg His Asn He 
145 

Asp Leu Leu Arg 

Leu Ser Arg Ser 

180 

Leu His His Leu 
195 

Leu Gin Gly Phe 
210 

Arg Ala Gly Ala 
225 

He Glu Lys Asn 

Ser Phe Val Ser 

260 



Arg Cys Glu Gin 

120 

Val Glu Glu Ser 
135 

Ala Pro He Phe 
150 

Gin Val Ala Ser 
165 

Gly Val Thr Gly 

He Glu Lys Leu 

200 

Gly He Ser Ser 
215 

Ala Gly Ala He 
230 

Leu Ala Ser Pro 
245 

Ala Met Lys Ala 



105 

Val Gly Val Asp 

Ala Pro Phe Arg 

140 

lie Cys Pro Pro 
155 

Tyr Gly Arg Gly 
170 

Ala Glu Asn Arg 
185 

Lys Glu Tyr His 

Pro Glu Gin Val 

220 

Ser Gly Ser Ala 
235 

Lys Gin Met Leu 
250 

Ala Ser Arg Ala 
265 



110 

Ser Val Leu Val 
125 

Gin Ala Ala Leu 

Asn Ala Asp Asp 

160 

Tyr Thr Tyr Leu 
175 

Gly Ala Leu Pro 
190 

Ala Ala Pro Ala 
205 

Ser Ala Ala Val 

He Val Lys He 

240 

Ala Glu Leu Arg 
255 



<210> 52 
<211> 393 
<212> PRT 

<213> Actinoplanes missouriensis 
<400> 52 

Ser Val Gin Ala Thr Arg Glu Asp Lys Phe Ser Phe Gly Leu Trp Thr 

15 10 15 

Val Gly Trp Gin Ala Arg Asp Ala Phe Gly Asp Ala Thr Arg Thr Ala 

20 25 30 

Leu Asp Pro Val Glu Ala Val His Lys Leu Ala Glu He Gly Ala Tyr 

35 40 45 

Gly He Thr Phe His Asp Asp Asp Leu Val Pro Phe Gly Ser Asp Ala 

50 55 60 

Gin Thr Arg Asp Gly He He Ala Gly Phe Lys Lys Ala Leu Asp Glu 
65 70 75 80 

Thr Gly Leu He Val Pro Met Val Thr Thr Asn Leu Phe Thr His Pro 

85 90 95 

Val Phe Lys Asp Gly Gly Phe Thr Ser Asn Asp Arg Ser Val Arg Arg 
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100 105 110 

Tyr Ala lie Arg Lys Val Leu Arg Gin Met Asp Leu Gly Ala Glu Leu 

115 120 125 

Gly Ala Lys Thr Leu Val Leu Trp Gly Gly Arg Glu Gly Ala Glu Tyr 

130 135 140 

Asp Ser Ala Lys Asp Val Ser Ala Ala Leu Asp Arg Tyr Arg Glu Ala 
145 150 155 160 

Leu Asn Leu Leu Ala Gin Tyr Ser Glu Asp Arg Gly Tyr Gly Leu Arg 

165 170 175 

Phe Ala He Glu Pro Lys Pro Asn Glu Pro Arg Gly Asp He Leu Leu 

180 185 190 

Pro Thr Ala Gly His Ala He Ala Phe Val Gin Glu Leu Glu Arg Pro 

195 200 205 

Glu Leu Phe Gly He Asn Pro Glu Thr Gly Asn Glu Gin Met Ser Asn 

210 215 220 

Leu Asn Phe Thr Gin Gly He Ala Gin Ala Leu Trp His Lys Lys Leu 
22 5 230 235 240 

Phe His He Asp Leu Asn Gly Gin His Gly Pro Lys Phe Asp Gin Asp 

245 250 255 

Leu Val Phe Gly His Gly Asp Leu Leu Asn Ala Phe Ser Leu Val Asp 

260 265 270 

Leu Leu Glu Asn Gly Pro Asp Gly Ala Pro Ala Tyr Asp Gly Pro Arg 

275 280 285 

His Phe Asp Tyr Lys Pro Ser Arg Thr Glu Asp Tyr Asp Gly Val Trp 

290 295 300 

Glu Ser Ala Lys Ala Asn He Arg Met Tyr Leu Leu Leu Lys Glu Arg 
305 310 315 320 

Ala Lys Ala Phe Arg Ala Asp Pro Glu Val Gin Glu Ala Leu Ala Ala 

325 330 335 

Ser Lys Val Ala Glu Leu Lys Thr Pro Thr Leu Asn Pro Gly Glu Gly 

340 345 350 

Tyr Ala Glu Leu Leu Ala Asp Arg Ser Ala Phe Glu Asp Tyr Asp Ala 

355 360 365 

Asp Ala Val Gly Ala Lys Gly Phe Gly Phe Val Lys Leu Asn Gin Leu 

370 375 380 

Ala He Glu His Leu Leu Gly Ala Arg 
385 390 



<210> 53 
<211> 348 
<212> PRT 
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<213> Bacteriophage T7 



<400> 53 

Val Asn He Lys 

1 

Ala He Lys Lys 

20 

Lys Tyr Asp Gly 
35 

Ser Tyr Trp Leu 
50 

Leu Asn Gly Phe 
65 

Cys Phe Tyr Lys 

Gly Val Asp Phe 

100 

Asp Thr Lys Asn 
115 

Arg Lys Lys Asp 
130 

He Lys Leu Tyr 
145 

Asp Cys Asp Val 

Leu Pro Leu Leu 

180 

Glu Ser Tyr Glu 
195 

Gin Lys Arg Ala 
210 

Cys He Tyr Lys 
225 

Glu Asn Glu Ala 

Gly Leu Ala Asn 

260 

Ser Gly Arg Leu 
275 

Glu Phe Thr Glu 
290 

Phe Ser Pro Tyr 



Thr Asn Pro Phe 
5 

Ala Leu Asp Asn 

Val Arg Gly Asn 

40 

Ser Arg Val Ser 
55 

Asp Val Arg Trp 
70 

Asp Gly Phe Met 
85 

Asn Thr Gly Ser 

Gin Glu Phe His 

120 

Lys Val Pro Phe 
135 

Ala He Leu Pro 
150 

Met Thr Leu Leu 
165 

Gin Glu Tyr Phe 

Val Tyr Asp Met 

200 

Glu Gly His Glu 
215 

Arg Gly Lys Lys 
230 

Asp Gly He He 
245 

Glu Gly Lys Val 

Val Asn Ala Thr 

280 

Thr Val Lys Glu 
295 

Gly He Gly Asp 



Lys Ala Val Ser 
10 

Ala Gly Tyr Leu 
25 

He Cys Val Asp 

Lys Thr He Pro 

60 

Lys Arg Leu Leu 
75 

Leu Asp Gly Glu 
90 

Gly Leu Leu Arg 
105 

Glu Glu Leu Phe 

Lys Leu His Thr 

140 

Leu His He Val 
155 

Met Gin Glu His 
170 

Pro Glu He Glu 
185 

Val Glu Leu Gin 

Gly Leu He Val 

220 

Ser Gly Trp Trp 
235 

Gin Gly Leu Val 
250 

He Gly Phe Glu 
265 

Asn He Ser Arg 

Ala Thr Leu Ser 

300 

Asn Asp Ala Cys 



Phe Val Glu Ser 
15 

He Ala Glu He 
30 

Asn Thr Ala Asn 
45 

Ala Leu Glu His 

Asn Asp Asp Arg 

80 

Leu Met Val Lys 
95 

Thr Lys Trp Thr 
110 

Val Glu Pro He 
125 

Gly His Leu His 

Glu Ser Gly Glu 

160 

Val Lys Asn Met 
175 

Trp Gin Ala Ala 
190 

Gin Leu Tyr Glu 
205 

Lys Asp Pro Met 

Lys Met Lys Pro 

240 

Trp Gly Thr Lys 
255 

Val Leu Leu Glu 
270 

Ala Leu Met Asp 
285 

Gin Trp Gly Phe 
Thr He Asn Pro 
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305 

Tyr Asp Gly Trp 

Gly Ser Leu Arg 

340 



310 

Ala Cys Gin lie 
325 

His Pro Ser Phe 



315 

Ser Tyr Met Glu 
330 

Val Met Phe Arg 
345 



320 

Glu Thr Pro Asp 

335 



<210> 54 

<2ll> 42 

<212> DNA 

<213> artificial sequence 
<220> 

<223> binding site for restrl and restr2 
<220> 

<221> CDS 

<222> (2) . . (40) 
<223> 

<400> 54 

9 gtg gta tea gca ggc cac tgc tac aag tec cgc ate cag gt 42 

Val Val Ser Ala Gly His Cys Tyr Lys Ser Arg He Gin 
15 10 



<210> 55 

<211> 13 

<212> PRT 

<213> artificial sequence 
<220> 

<223> binding site for restrl and restr2 

<400> 55 

Val Val Ser Ala Gly His Cys Tyr Lys Ser Arg He Gin 
1 5 10 

<210> 56 

<211> 42 

<212> DNA 

<213> artificial sequence 
<220> 

<223> forward primer restrl 
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<400> 56 

ggtggtatcc gcgggccact gctacaagtc ccggatccag gt 



42 



<210> 
<211> 
<212> 
<213> 
<220> 
<223> 



57 
42 
DNA 

artificial sequence 
reverse primer re3tr2 



<400> 57 

acctggatcc gggacttgta gcagtggccc gcggatacca cc 



42 



<210> 
<211> 
<212> 
<213> 
<220> 
<223> 
<220> 
<221> 
<222> 
<223> 



58 
50 
DNA 

artificial sequence 

binding site for restr3 and restr4 
CDS 

(3) . . (50) 



<400> 58 

cc act ggc acg aag tgc etc ate tct ggc tgg ggc aac act gcg age 
Thr Gly Thr Lys Cys Leu lie Ser Gly Trp Gly Asn Thr Ala Ser 
15 10 15 

tct 
Ser 



47 



50 



<210> 
<211> 
<212> 
<213> 
<220> 
<223> 



59 
16 
PRT 

artificial sequence 

binding site for restr3 and restr4 
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<400> 59 

Thr Gly Thr Lys Cys Leu He Ser Gly Trp Gly Asn Thr Ala Ser Ser 
15 10 15 

<210> 60 

<211> 50 

<212> DNA 

<213> artificial sequence 
<220> 

<223> forward primer restr3 

<400> 60 

ccactggcac gaagtgcctc atctctggct ggggcaacac tgcgagctct 50 

<210> 61 

<211> 50 

<212> DNA 

<213> artificial sequence 
<220> 

<223> reverse primer restr4 

<400> 61 

agagctagca gtgttgcccc agccagagat gaggcacttg gtaccagtgg 50 

<210> 62 

<211> 30 

<212> DNA 

<213> artificial sequence 
<220> 

<223> primer puc-forward 

<4Q0> 62 

ggggtacccc accaccatga atccactcct 30 



<210> 63 
<211> 30 
<212> DNA 
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65 



<213> 



artificial sequence 



<220> 



<223> 



primer puc-reverse 



<400> 



63 



cgggatccgg tatagagact gaagagatac 



30 



<210> 64 

<211> 3? 

<212> DNA 

<213> artificial sequence 
<220> 

<223> oligox-SDRlf 
<220> 

<221> misc_f eature 

<222> (14).. (31) 

<223> any nucleotide 
<220> 

<221> mi sc__f eature 

<222> (14).. (31) 

<223> any nucleotide or amino acid residue 
<220> 

<22X> CDS 

<222> (2) . . (37) 
<223> 

<400> 64 

g ggc cac tgc tac nnn nnn nnn nnn nnn nnn aag tec eg 39 

Gly His Cys Tyr Xaa Xaa Xaa Xaa Xaa Xaa Lys Ser 
15 10 



<210> 65 

<211> 12 

<212> PRT 

<213> artificial sequence 
<220> 

<221> miscfeature 

<222> (5).. (5) 

<223> The 'Xaa* at location 5 stands for Lys, Asn, Arg, Ser, Thr, lie, 
Met, Glu, Asp, Gly, Ala, Val, Gin, His, Pro, Leu, a stop codon, T 
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66 



yr, Trp, Cys, or Phe 



<220> 
<221> 
<222> 
<223> 



misc_f eature 
(6) . . (6) 

The 'Xaa' at location 6 stands for Lys, Asn, Arg, Ser, Thr, He, 
Met, Glu, Asp, Gly, Ala, Val, Gin, His, Pro, Leu, a stop codon, T 
yr, Trp, Cys, or Phe. 



<220> 
<221> 
<222> 
<223> 



misc_f eature 
(7) . . (7) 

The 'Xaa' at location 7 stands for Lys, Asn, Arg, Ser, Thr, lie, 
Met, Glu, Asp, Gly, Ala, Val, Gin, His, Pro, Leu, a stop codon, T 
yr, Trp, Cys, or Phe. 



<220> 
<221> 
<222> 
<223> 



mi sc_f eature 
(8) . . (8) 

The 'Xaa* at location 8 stands for Lys, Asn, Arg, Ser, Thr, He, 
Met, Glu, Asp, Gly, Ala, Val, Gin, His, Pro, Leu, a stop codon, T 
yr, Trp, Cys, or Phe. 



<220> 
<221> 
<222> 
<223> 



misc_feature 
(9) . . (9) 

The •Xaa' at location 9 stands for Lys, Asn, Arg, Ser, Thr, He, 
Met, Glu, Asp, Gly, Ala, Val, Gin, His, Pro, Leu, a stop codon, T 
yr, Trp, Cys, or Phe. 



<220> 
<221> 
<222> 
<223> 



misc_feature 
(10) . . (10) 

The 'Xaa* at location 10 stands for Lys, Asn, Arg, Ser, Thr, He, 

Met, Glu, Asp, Gly, Ala, Val, Gin, His, Pro, Leu, a stop codon, 
Tyr, Trp, Cys, or Phe. 



<220> 
<223> 
<220> 
<221> 
<222> 



oligox-SDRlf 

misc_feature 
(14) . . (31) 
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<223> any nucleotide 
<220> 

<22l> misc_feature 
<222> (14) . . (31) 

<223> any nucleotide or amino acid residue 
<400> 65 

Gly His Cys Tyr Xaa Xaa Xaa Xaa Xaa Xaa Lys Ser 
15 10 



<210> 66 

<211> 45 

<212> DNA 

<213> artificial sequence 
<220> 

<223> oligox-SDRlr 
<220> 

<221> misc_feature 

<222> (16) . . (33) 

<223> any nucleotide 

<400> 66 

cgcccggtga cgatgnnnnn nnnnnnnnnn nnnttcaggg cctag 



<210> 67 

<211> 47 

<212> DNA 

<213> artificial sequence 
<220> 

<223> oligox-SDR2f 
<220> 

<221> CDS 

<222> (2).. (46) 
<223> 
<220> 

<221> misc_feature 

<222> (29) . . (43) 

<223> any nucleotide or amino acid residue 
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68 



<400> 67 



c aag tgc etc ate tct ggc tgg ggc aac nnn nnn nnn nnn nnn act g 
Lys Cys Leu lie Ser Gly Trp Gly Asn Xaa Xaa Xaa Xaa Xaa Thr 



47 



1 



5 



10 



15 



<210> 68 
<211> 15 
<212> PRT 

<213> artificial sequence 
<220> 

<221> misc_feature 
<222> (10) . . (10) 

<223> The 'Xaa' at location 10 stands for Lys, Asn, Arg, Ser, Thr, lie, 
Met, Glu, Asp, Gly, Ala, Val, Gin, His, Pro, Leu, a stop codon, 
Tyr, Trp, Cys, or Phe. 

<220> 

<221> misc_feature 
<222> (11).. (11) 

<223> The 'Xaa' at location 11 stands for Lys, Asn, Arg, Ser, Thr, lie, 
Met, Glu, Asp, Gly, Ala, Val, Gin, His, Pro, Leu, a stop codon, 
Tyr, Trp, Cys, or Phe. 

<220> 

<221> raisc_feature 
<222> (12).. (12) 

<223> The * Xaa* at location 12 stands for Lys, Asn, Arg, Ser, Thr, lie, 
Met, Glu, Asp, Gly, Ala, Val, Gin, His, Pro, Leu, a stop codon, 
Tyr, Trp, Cys, or Phe. 

<220> 

<221> misc_feature 
<222> (13).. (13) 

<223> The 'Xaa 1 at location 13 stands for Lys, Asn, Arg, Ser, Thr, lie, 
Met, Glu, Asp, Gly, Ala, Val, Gin, His, Pro, Leu, a stop codon, 
Tyr, Trp, Cys, or Phe. 

<220> 

<221> misc_feature 
<222> (14).. (14) 

<223> The 'Xaa 1 at location 14 stands for Lys, Asn, Arg, Ser, Thr, lie, 
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Met, Glu, Asp, Gly, Ala, Val, Gin, His, Pro, Leu, a stop codon, 
Tyr, Trp, Cys, or Phe. 

<220> 

<223> oligox-SDR2f 
<220> 

<221> misc_f eature 
<222> (29).. (43) 

<223> any nucleotide or amino acid residue 
<400> 68 

Lys Cys Leu lie Ser Gly Trp Gly Asn Xaa Xaa Xaa Xaa Xaa Thr 
15 10 15 

<210> 69 

<211> 55 

<212> DNA 

<213> artificial sequence 
<220> 

<223> oligox-SDR2r 
<220> 

<221> misc_feature 

<222> (33).. (47) 

<223> any base 
<220> 

<221> misc_feature 

<222> (33) . . (47) 

<223> any nucleotide 

<400> 69 

catggttcac ggagtagaga ccgaccccgt tgnnnnnnnn nnnnnnntga cgatc 55 

<210> 70 

<211> 59 

<212> DNA 

<213> artificial sequence 
<220> 

<223> primer SDRl-mutnnb-f orward 
<220> 

<221> misc feature 
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<222> (24).. (40) 

<223> N=A, C, G, T; B=C, G, T; V=A, C, G 
<400> 70 

tggtatccgc gggccactgc tacnnbnnbn nbnnbnnbnn baagtcccgg atccaggtg 



<210> 


71 


<211> 


52 


<212> 


DNA 


<213> 


artificial sequence 


<220> 




<223> 


primer SDR2-mutnnb-reverse 


<220> 




<221> 


mis cofeature 


<222> 


(20) . . (33) 


<223> 


N=A, C, G, T; B=C, G, T; V= 



<400> 71 

ggcgccagag ctagcagtvn nvnnvnnvnn vnngttgccc cagccagaga tg 

<210> 72 

<211> 6 

<212> PRT 

<213> artificial sequence 
<220> 

<223> variant g SDR1 

<400> 72 

Ala Phe Phe Asn Gly Asp 
1 5 

<210> 73 

<211> 5 

<212> PRT 

<213> artificial sequence 
<220> 

<223> variant g SDR2 

<400> 73 

Arg Lys Asp Pro Trp 
1 5 



WO 2004/113521 



71 
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<210> 74 

<211> 234 

<212> PRT 

<213> artificial sequence 



<220> 

<223> artificial sequence 
<400> 74 

He Val Gly Gly Tyr Asn Cys Glu Glu Asn Ser Val Pro Tyr Gin Val 

1 5 10 15 

Ser Leu Asn Ser Gly Tyr His Phe Cys Gly Gly Ser Leu lie Asn Glu 

20 25 30 

Gin Trp Val Val Ser Ala Gly His Cys Tyr Ala Ala Phe Asn Gly Lys 

35 40 45 

Ser Arg He Gin Val Arg Leu Gly Glu His Asn He Glu Val Leu Glu 

50 55 60 

Gly Asn Glu Gin Phe He Asn Ala Ala Lys He He Arg His Pro Gin 
65 70 75 80 

Tyr Asp Arg Lys Thr Leu Asn Asn Asp He Met Leu He Lys Leu Ser 

85 90 95 

Ser Arg Ala Val He Asn Ala Arg Val Ser Thr He Ser Leu Pro Thr 

100 105 110 

Ala Pro Pro Ala Thr Gly Thr Lys Cys Leu He Ser Gly Trp Gly Asn 

115 120 125 

Arg Lys Asp Phe Trp Thr Ala Ser Ser Gly Ala Asp Tyr Pro Asp Glu 

130 135 140 

Leu Gin Cys Leu Asp Ala Pro Val Leu Ser Gin Ala Lys Cys Glu Ala 
145 150 155 160 

Ser Tyr Pro Gly Lys He Thr Ser Asn Met Phe Cys Val Gly Phe Leu 

165 170 175 

Glu Gly Gly Lys Asp Ser Cys Gin Gly Asp Ser Gly Gly Pro Val Val 

180 185 190 

Cys Asn Gly Gin Leu Gin Gly Val Val Ser Trp Gly Asp Gly Cys Ala 

195 200 205 

Gin Lys Asn Lys Pro Gly Val Tyr Thr Lys Val Tyr Asn Tyr Val Lys 

210 215 220 

Trp He Lys Asn Thr He Ala Ala Asn Ser 
225 230 



<210> 75 
<211> 234 
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72 

<212> PRT 

<213> artificial sequence 
<220> 

<223> artificial sequence 
<400> 75 

lie Val Gly Gly Tyr Asn Cys Glu Glu Asn Ser Val Pro Tyr Gin Val 

15 10 15 

Ser Leu Asn Ser Gly Tyr His Phe Cys Gly Gly Ser Leu lie Asn Glu 

20 25 30 

Gin Trp Val Val Ser Ala Gly His Cys Tyr Ala Ala Phe Asn Gly Lys 

35 40 45 

Ser Arg lie Gin Val Arg Leu Gly Glu His Asn lie Gly Val Leu Glu 

50 55 * 60 

Gly Asn Glu Gin Phe lie Asn Ala Ala Lys lie lie Arg His Pro Gin 
65 70 75 80 

Tyr Asp Trp Lys Thr Leu Asn Asn Asp lie Met Leu lie Lys Leu Ser 

85 90 95 

Ser Arg Ala Val lie Asn Ala Arg Val Ser Thr lie Ser Leu Pro Thr 

100 105 110 

Ala Pro Pro Ala Thr Gly Thr Lys Cys Leu lie Ser Gly Trp Gly Asn 

115 120 125 

Arg Lys Asp Phe Trp Thr Ala Ser Ser Gly Ala Asp Phe Pro Asp Glu 

130 135 140 

Leu Gin Cys Leu Asp Ala Pro Val Leu Ser Gin Thr Lys Cys Glu Ala 
145 150 155 160 

Ser Tyr Pro Gly Lys lie Thr Ser Asn Met Phe Cys Val Gly Phe Leu 

165 170 175 

Glu Gly Gly Lys Asp Ser Cys Gin Gly Asp Ser Gly Gly Pro Val Val 

180 185 190 

Arg Asn Gly Gin Leu Gin Gly Val Val Ser Trp Gly Asp Gly Cys Ala 

195 200 205 

Gin Lys Asn Lys Pro Gly Val Tyr Thr Lys Val Tyr Asn Tyr Val Lys 

210 215 220 

Trp lie Lys Asn Thr lie Ala Ala Asn Ser 
225 230 

<400> 75 

ggcgccagag ctagcagtnn nnnnnnnnnn nnngttgccc cagccagaga tg 52 



<210> 76 

<211> 12 

<212> PRT 

<213> artificial sequence 
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73 

<220> 

<223> substrate A 



<400> 76 

Leu Leu Trp Leu Gly Arg Val Val Gly Gly Pro Val 
15 10 



<210> 


77 


<211> 


12 


<212> 


PRT 


<213> 


artificial sequence 


<220> 




<223> 


substrate B 


<400> 


77 



Lys Lys Trp Leu Gly Arg Val Pro Gly Gly Pro Val 



<210> 78 

<211> 6 

<212> PRT 

<213> artificial sequence 
<220> 

<223> variantl SDR1 

<400> 78 

Asp Ala Val Gly Arg Asp 
1 5 



<210> 79 

<211> 6 

<212> PRT 

<213> artificial sequence 
<220> 

<223> variant2 SDR1 

<400> 79 

Asn Gly Arg Asp Leu Glu 
1 5 



<210> 80 

<211> 6 

<212> PRT 

<213> artificial sequence 
<220> 

<223> variants SDRl 

<400> 80 

Gly Phe Val Met Phe Asn 
1 5 



<210> 81 

<211> 5 

<212> PRT 

<213> artificial sequence 
<220> 
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74 

<223> variantl SDR2 
<400> 81 

Arg Val His Pro Ser 



<210> 82 

<211> 5 

<212> PRT 

<213> artificial sequence 
<220> 

<223> variant2 SDR2 

<400> 82 

Val Arg Gly Thr Trp 
1 5 



<210> 83 

<211> 5 

<212> PRT 

<213> artificial sequence 
<220> 

<223> variant3 SDR2 

<400> 83 

Arg Ser Pro Leu Thr 



<210> 84 

<211> 6 

<212> PRT 

<213> artificial sequence 
<220> 

<223> variant a SDR1 

<400> 84 

Arg Pro Trp Asp Pro Ser 



<210> 85 

<211> € 

<212> PRT 

<213> artificial sequence 
<220> 

<223> variant b SDR1 

<400> 85 

Gly Phe Val Met Phe Asn 
1 * 5 



<210> 86 

<211> 6 

<212> PRT 

<213> artificial sequence 
<220> 

<223> variant c SDR1 



WO 2004/113521 _ c PCT/EP2 004/05 1172 

to 

<400> 86 

Glu He Ala Asn Arg Glu 
1 5 



<210> 87 

<2U> 6 

<212> PRT 

<213> artificial sequence 
<220> 

<223> variant d SDR1 



<400> 87 
Lys Ala Val 
1 



Val Gly Thr 
5 



<210> 88 

<211> 6 

<212> PRT 

<213> artificial sequence 
<220> 

<223> variant e SDR1 

<400> 88 

Val Asn He Met Ala Ala 
1 5 



<210> 89 

<211> 6 

<212> PRT 

<213> artificial sequence 
<220> 

<223> variant f SDR1 

<400> 89 

Ala Ala Phe Asn Gly Asp 
1 5 



<210> 90 

<211> 5 

<212> PRT 

<213> artificial sequence 
<220> 

<223> variant a SDR2 

<400> 90 

Val His Pro Thr Ser 
1 5 



<210> 91 

<211> 5 

<212> PRT 

<213> artificial sequence 
<220> 

<223> variant b SDR2 

<400> 91 

Arg Ser Pro Leu Thr 
1 5 
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<210> 92 

<211> 5 

<212> PRT 

<213> artificial sequence 
<220> 

<223> variant c SDR2 



<400> 92 

Arg Gly Ala Arg Thr 
1 5 



<210> 93 

<211> 5 

<212> PRT 

<213> artificial sequence 
<220> 

<223> variant d SDR2 



<400> 93 

Arg Thr Pro lie Ser 
1 5 



<210> 94 

<211> 5 

<212> PRT 

<213> artificial sequence 
<220> 

<223> variant e SDR2 



<400> 94 

Thr Thr Ala Arg Lys 
1 5 



<210> 95 

<211> 5 

<212> PRT 

<213> artificial sequence 
<220> 

<223> variant f SDR2 



<400> 95 

Arg Lys Asp Phe Trp 
1 5 



<210> 96 

<211> 157 

<212> PRT 

<213> Homo sapiens 



<400> 96 

Val Arg Ser Ser 

1 

Val Ala Asn Pro 

20 

Ala Asn Ala Leu 
35 

Val Val Pro Ser 



Ser Arg Thr Pro 
5 

Gin Ala Glu Gly 

Leu Ala Asn Gly 

40 

Glu Gly Leu Tyr 



Ser Asp Lys Pro 
10 

Gin Leu Gin Trp 
25 

Val Glu Leu Arg 
Leu lie Tyr Ser 



Val Ala His Val 
15 

Leu Asn Arg Arg 
30 

Asp Asn Gin Leu 
45 

Gin Val Leu Phe 
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50 55 60 

Lys Gly Gin Gly Cys Pro Ser Thr His Val Leu Leu Thr His Thr lie 
65 70 75 80 

Ser Arg He Ala Val Ser Tyr Gin Thr Lys Val Asn Leu Leu Ser Ala 

85 90 95 

He Lys Ser Pro Cys Gin Arg Glu Thr Pro Glu Gly Ala Glu Ala Lys 

100 105 110 

Pro Trp Tyr Glu Pro He Tyr Leu Gly Gly Val Phe Gin Leu Glu Lys 

115 120 125 

Gly Asp Arg Leu Ser Ala Glu He Asn Arg Pro Asp Tyr Leu Leu Phe 

130 135 140 

Ala Glu Ser Gly Gin Val Tyr Phe Gly He He Ala Leu 
145 150 155 
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